THE ANNALS 

of 

MATHEMATICAL 

STATISTICS 

rOOlfOep AND KDXTBD BT H. C. CABVEK, 1030-1838 
BUITliD BT fl B. -WILKB, IIQS-IMB 

The; Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XXI 


1950 



THE ANNALS 


OF MATHEMATICAL STATISTICS 


R. C. BOSE 
W. FELLER 


M. S. Babtlitt 
David Blaokwui,!. 
Gborob W. Brown 
Harald Cham^b 
William Q. Cochran 
J. F. Daly 
W Edwards Dbminq 
J. L Door 


Editor 

T. W. ANDERSON 

Associate EotToim 
M. A. aiRsmcK 
E, L. LEHMANN 

WITH TB* OOOPEBATION OF 
Raul 8. Dwren 
Ohurohill Eisbmhart 
T. E. Habbib 
I’aol Q. Hoel 
Harold Hotbluno 
Howard Lbvbmb 
William Q. Madow 
H. B, Manm 
Frbdbiuok Mobtellbr 


ALEXANDER M. MtWII 
JOHN W. TOKKY 


4 N (STM AN 
H, K lloMINR 
H. N, Hoy 
Henbv BoHtst'ri 
Walter A. HiiRWiiAitr 
A. Wald 

JaciiW Wi»i,wiwt« 

.Max a. WfwOBoav 


Fubllahed quarterly by the Institute of MathemBlkal Slatielies in March, June, 
September ana December at Baltimore, Maryland 


INSTITUTE OF MATHEMATICAL STATISTK’S 

General Business Administration Building, University of Michigan, 

Office: Ann Arbor, Michigan 

C. H Fischer, Secrotary-TreRauror 

This address should be used for till communicationB concerning 
membership, subscriptions, changes of addreas, Imck numbers, 
etc., but not for editorial correspondence. Changes in mailing 
address which are to become effective for a given irbub should bo 
reported to the Secretary on or before tlie l.’ith of the month 
preceding the month of that Issue. 

Editorial Department of Mathematical Statistics, Ooltiriibin University, 
Office: New York 27, New York 

T. W. Anderson, Editor 

Manuscripts should be submitted to this tulilrcss; each manu¬ 
script should be typewritten, double-spacocl with wide margins, 
and the origihal copy should be submitted (preferably with one 
additional copy). Footnotes should bo reduced to a minimum 
and whenever possible replaced by a bibliography at the end of 
the paper; formulae in footnotes should be avoidcrl. FigurcH, charts, 
and diagrams should be professionally drawn on plain white 
paper or tracing cloth in black India ink twice the siso they 
are to be printed, Authors are requested to keep in mind typo- 
graimical difficulties of complicated mathematical formulae. 
Authors will ordinarily receive only galley proofs. Hfty reprints 
without covers will be furnished free. Additional reprinbi and 
covers furnished at cost. 

Subscription |10.00 per year inside the Western Hemisphere; S5.00 elsewhere. 

Brice: Single issues $3,00. Back numbers are available at StO.OO per vol¬ 

ume or $3,00 per single issue. 

Coupobbd and Printed at the 
WAVERLY press, Ino., Baiotmorb, Maryland, U. S. A, 


Enlmd u matter at the Po.t Office et BiJUmo™, MerylaeJ, uorter the luR of Morel, », 187g. 

Copyright, J950, by the Institute of MathematioiU BUUaUca. 


CONTENTS OF VOLUME Z1 
AimcLKS 

Akdeiwix, R. I.., AND T. W. Anoeiwon. Distribution of the Circular Serial 
Corn'latiun Cocfilcicnt for Residuals from a Fitted Fourier Series.. , 69 
Andkeson, T. W., and H. L. Andeiwon. Distrilmtion of the Circular Serial 
Correlation C-ofdfieient for Residuals from a Fitted Fourier Series , . 59 
ANnEiiSOK, 'I'. W,, AND Ueiiman llrniN. The Asymptotic Properties of Esti¬ 
mates of the. PnrameterH of a Single I'h|uation in a (hmiplete System of 

Stochastic FajualionR .570 

Bahaddr, Raoiui Raj. On a Problem in the Theory of k Populations . 302 

Bahadur, RAonr Raj, and Hkrueut RonniNs. The Problem of the. Greater 

Mean .409 

Bahankin, K. W. iwtensian of a Theorem of Blae.kwcll.280 

Birnbaum, 7). W. I'iffecl of Linear 'rnincation on a Mullinoi-raal Population 272 
Birnbaum, 7i. W., AND I). G. (‘kabman. On Optimum Selections from 

Multinormal Populations.443 

B 1 . 0 MQVI 8 T, Nn.w. Gn a Measure of Dependence lietwcen Two Random 

Variables .593 

Carpenter, Ormer. Is’ote on the Kxtenaion of Craig’s Theorem to Non- 

Central t'ariaU's . . 465 

CAaTKLiiiANi, Maria. On Multinomial Distributions with Idmited Freedom; 

A Stochastic Genesis of Pawito’s and Pearson’s CurvM. 289 

Grand, XJ'mM. Distrihutlona Relaled to Comparison of Two Means and 

Two Regression Coefficients. ..607 

Chapman, Dot'oDAS G. Some Two Sample Teats. 001 

Chapman, D. G., and Z. VV. Birnbaum. On Optimum Selections from 

Multinormal Populations... 443 

CoHKN, A. C., Jr. Fjjtimating the Mean and Variance of Normal Popula¬ 
tions from Singly Tninented and Doubly Truncated Samples.667 

Davis, R. C. Derivation of a Broad Class of Consistent Estimates. 426 

Dixon, W. J. Analysis of Extreme Values...488 

ErdSs, P. Remark on my paper "On a Theorem of Hsu and Robbins"— 138 

Felder, W. Errata ....... — 801 

Freeman, Muerat F., and John W. Tdkby. Transformations Related to 

the Angular and the Square Root. 607 

Gusaow, Mark 0., and Robert E. Greenwood. Distribution of Maxi¬ 
mum and Minimum Frequenci^ in a Sample Drawn from a Multi¬ 
nomial Distribution.. .. 410 

Greenwood, Robert E., and Mark 0, Gdabqow. Distribution of Maxi¬ 
mum and Minimum Frequencies in a Sample Drawn from a Multi¬ 
nomial Distribution... 416 

Grevidde, T. N. E. R.emark on W. M. Kincaid’s "Note on the Error in In¬ 
terpolation of a Function of Two Independent Variables".. 137 

U1 



















iv 


VOLUME INDEX 


Grevillb, T. N, E. Remark on the Article "On a Class of Dktributinna 


that Approach the Normal Dietribiition Function" by Oorgc B. Dan! - 

zig ,, ,. ,, , .. . R11 

Grubbs, Frank E. Sample Criteria for Testing Outlying tlbeervation* 27 
Gumbel, E, J., and R. i3. Keenev. The Geometric Rang*? for 1 Jisiributiuns 

of Cauchy’s Type. . . . 

Gumbel, E. J,, and R. D. Keeney, The Extremal Quotient .'*33 

Gumbel, E. J., and H. von Schklling. The Distribution of the Numlar tif 

Exceedances . i , . 247 

Hammbrslby, j. M. The Distribution of Distance in a HyiMtrsphere . 447 
Hodges, J. L., Jr., and E. L. Lehmann. Some Problems in Minitnax Point 

Estimation . .. . . 182 

Howell, John M, Errata to "Control Chart for Largwt and Smallcjit 

Values". 015 

Katz, Leo. On the Relative Efficiencies of BAN Estimates , , 398 

ICawada, Yukiyosi. Independence of Quadratic Forma in Normally (Jorre- 

lated Variables. ., . lUi 

Keeney, R. D,, and E. J. Gumbel. The Geometrio Range bm Dislribulions 

of Cauchy’s Type. 133 

Keeney, R. D,, and E. J. Gumbel. The Extremal Quotient . .ViS 

Kimball, Bradford F, On the Asymptotic Distribution of the Sum of Pow¬ 
ers of Unit Frequency Differences. 203 

Koopmans, T 0,, AND 0. Reiebs^l, The Identification of Stnictural Char- 


- -- ■ • .. . . . ► ... . . , . , , , , . „ 

Krishna Iyer, P, V, The Theory of Probability Distriliutions of Points on 
a Lattice...jgg 


Lehmann, E. L, Some Principles of the Theory of Testing Hypotheses . . 1 

Lehmann, E. L., and J. L. Hodges, Jr, Some Problems in Minirnax Point 

Estimation... jig 2 

Lehmann, E. L., and Charles Stein. Completeness in the Sequential Case 370 
Link, Richard F. The Sampling Distribution of the Ratio of Two RangM 

from Independent Samples. ^ jj 2 

LofcvB, M. Fundamental Limit Theorems of Probability Theory.. . 321 

Massey, Frank J., Jr. A Note on the Estimation of a Distribution Func¬ 
tion by Confidence Limits . I jg 

Massey, F J Jr. A Note on the Power of a Non-Paiametrio Tost....... 440 

Morrison, Winifred J„ and Jack Sherman. Adjustment of an Inverse 
Matrix Corresponding to a Change in One Element of a Given Matrix 124 
Mosteller Frederick, and .John W. Tukey. Significance Levels for a 
fc-Sample Slippage Test . _ J20 

n u' Sum of Roots of a Detenninantal Equa¬ 
tion Under a Certain Condition... 432 

Completely Unbiassed Characte;‘of Teste'of Inde¬ 
pendence in Multivariate Normal Systems. naa 

OACK, Albert A Class of Random Variables with Discrete Distributions 127 




















VOLDM.E INDEX 


V 


Noetheh, Go'itfkied Emanuel. Asymptotic Properties of the Wald-Wolfo- 

witz Test of Randomness.231 

Paull, a. E. On a Preliminary Test for Pooling Mean Squares in the Analy¬ 
sis of Variance.539 

PxLLAi, K. 0. S. On the Distributions of Midmnge and Semi-Range in Sam- 

I)Ies from a Normal Population. lOO 

Rbueks^l, 0., AND T. C. Koopmans. The Identification of Structural Char- 

actcriatics .165 

RoiiMiNs, HKKHEJtT, AND Raohu Raj Bahaduh. The Problem of the Greater 

Mean.409 

Rubin, Herman, and T. W. Anderson. The Asymptotic Properties of Es¬ 
timates of the Parameters of a Single Ecpiation in a Complete System 

of Stoehastic Equations . 570 

Scott, Elizabeth L. Note on Consistent Estimates of tire Linear Struc¬ 
tural Relation Between Two Variables . 284 

Skth, G. R. On the Distribution of the Two Closest Among a Set of Three 

Observations.298 

Sherman, B. A Random Variable Related to the Spacing of Sample Values 330 
Sherman, Jack, and WmiimED J. Morrison. Adjustment of an Inverse 
Matrix Corresponding to a Change in One Element of a Given Matrix, 124 
Shrikhandk, S. B. The Imjjosaibility of Certain Symmetrical Balanced In¬ 
complete Block Designs. 100 

Stein, Charles. Unbiased Estimates with Minimum Variance.406 

Stein, Charles, and E. L. Lehmann. Completeness in the Sequential 

Case.376 

Tukby, John W., and Murray F. Freeman. Transformations Related to 

the Angular and the Square Root. 007 

Tukey, John W., and Frederick Mostbller, Significance Levels for a 

Ic-Somplo Slippage Teat. 120 

VON Sohellinq, Hermann. A Second Formula for the Partial Sum of Hy¬ 
pergeometric Series Having Unity as the Fourth Argument.458 

VON SoHELLiNG, H., AND E. J. GuMBBL. The Distribution of the Number of 

Exceedances. 247 

Wald, A., and J. Wolpowitz. Bayes Solutions of Sequential Decision 

Problems... 82 

Walsh, John E, Some Estimates and Tests Based on the r Smallest Values 

in a Sample. ... 386 

Walsh, John E. Some Nonparametric Tests of Whether the Largest Ob¬ 
servations of a Set are Too Large or Too Small. 683 

WoLPowiTZ, J. Minimax Estimates of the Mean of a Normal Distribution 

ivith Known Variance. 218 

WoLPowiTz, J,, AND A. Wald. Bayes Solutions of Sequential Decision 

Problems... 82 

Zeiglbr, R. K. a Note on the Asymptotic Simultaneous Distribution of the 
Sample Median and the Mean Deviation from the Sample Median... 462 























VI 


VOLUME INDEX 


Miscellaneous 

Abstracts of Papers.139) 302» 401, 610 

Minutes of the Annual Membership Meeting, New York, Densember 28, UM9 lfi5 

News and Notices.. .. 147, 315, 402, 020 

Report of the Berkeley Meeting of the Institute . . . . 022 

Report of the Chapel Hill Meeting of the Institute.217 

Report of the Chicago Meeting of the Institute ., , 407 

Report of the New York Meeting of the Institute .151 

Report of the President of the Institute. . ... 155 

Report of the Secretary-Treasurer of the InstituUi. .... HK) 

Report of the Editor of the Annals. 10,2 








SOME PRINCIPLES OF THE THEORY OF TESTING HYPOTHESES' 

By E. L. Lkhma-Nn 
Universiiy of California, Berkeley 

Introduction i 

1. The likeliliood ratio principle. The development of a theory of hypothesis 
toting (as contrasted with the consideration of particular cases), may be said 
to have begun with the 1928 paper of Neyman and Pearson [16], For in this 
paper the fundamental fact is pointed out that in selecting a suitable test one 
must take into account not only the hypothesis but also the alternatives against 
which the hypothesis is to be toted, and on this basis the likelihood ratio princi¬ 
ple is proposed as a generally applicable criterion. This principle has proved 
extremely successful; nearly all tots now in use for testing parametric hypoth¬ 
eses are likelihood ratio tests, (for an extension to the non-parametric cose 
see [33]), and many of them have been shown to posseja various optimum proper¬ 
ties. 

At least in the parametric case the likelihood ratio test has a number of desir¬ 
able properties. Among these we mention: 

(i) Frequently it is easy to apply and leads to a definite and reasonable test. 

(ii) If the sample, size is large, and if certain regularity conditions are satisfied 
an approximate solution can bo given for the distribution problems that arise 
in the determination of size and power of the test (Wilks (32), Wald [25]), In 
fact, if the likelihood ratio is denoted by X, —2 log X approximately has a central 
X*-di8tribution under the hypothesis, a non-central x^-distribution under the 
alternatives. The number of degrees of freedom in these distributions equal the 
number of constraints imposed by the hypothesis. 

(iii) As was shown by Wald [25], under certain restrictions the likelihood ratio 
test possesses various pleasant large sample properties. 

In view of this, one may feel that the likelihood ratio principle, although per¬ 
haps not always leotling to the optimum tot, is completely satisfactory, and 
that a more syateraatic study of the problem of teat selection is not necessary. 
Unfortunately, against the pleasant properties just mentioned there stands a 
very unpleasant one. Oases exist, in which the likelihood ratio tot is not only 
unsatisfactory but worse than useless, and hence the likelihood ratio principle 
is not reliable. Examples of this kind were constructed independently by H. 
Rubin and C, Stein; the following is Stein’s example. 

’ Parts of this paper wore presented in an in-vuted address at the meeting of the Institute 
of Mathematical Statistics on Dec. 30,1948, in Cleveland, Ohio. 



2 


E. h. LEHMANN 


Let Z be a random variable capable of taking on the values 0, 12 with 

probabilities as indicated: 


-2 2 -1 


0 


Hypothesis H : ^ 


Alternatives: pC 




a 


1 

i 


- a\2 


“) 


a 


I - C 
i - « 


Here a, C are constants, 0 < a ^ h < C < ct, and p ranges over the 
interval [0,1]. ^ 

It is desired to test the hypothesis H at significance level a. The likelihowl 
ratio test rejects when Z = ± 2, and hence its power is C against each alterna¬ 
tive. Since C < a, this test is literally worse than useless, for a test with power 
a can be obtained without observing Z at all, simply by the use of it table of 
random numbers. It is worth noting that the test, which rejects // when A' 0, 


has power a 


1-0 


> a, SO that a reasonable test of the hypothesis in f[ues* 


1 - a 
tion does exist. 

The existence of such examples gives added importance to the problem of 
developing a systematic theory of hypothesis testing. It is the purpose of the 
present paper to give a brief survey of the work done on some aspects of sucli a 
theory and to indicate certain extensions and modifications of the existing theory. 
Some examples and applications will be considered. These will be restricted to 
parametric problems. For applications to testing non-parametric hyiiothesea 
see [12], 


The results of sections 6 and 8 were obtained jointly by Gilbert Hunt and 
Charles Stein in 1945. They have not been published and were communicated 
to me by Professor Stein. I should like to express to him my gratitude for ac¬ 
quainting me with this material and for giving me permission to include it in 
this paper. I should also like to acknowledge my indebtedness to Prof^or 
Henry Scheff5 who read the manuscript and made many helpful suggestions. 

2. Formulation of the problem. The problem of testing a statistical hypothesis 
was formulated by Neyman and Pearson [18] as follows. 

A random variable Z is known to be distributed over a space according to 
some member of a family of probability distributions [Pf], etQ, It will be 
assumed here that there is specified an additive class 33 of sets in SE, and that 
the probability distributions Pf are probability measures defined over 39. All 
sets or real valued functions mentioned in this paper will be assumed meas- 
ura e 58 unless otherwise stated. If 5 *58, we shall write for the measure as- 
si^ed to B by P, interchangeably P?(Z « 5), Pf (5), and if there is no possi- 

Throughout most of the paper it will be assumed 
that the probability measures Pf are absolutely continuous with respect to a 



THIOORY Of TESTING HYPOTHESES 


3 


given sigma finite measure a defined over 53, so that there exist non-negative 
functions fs such that 

(2.1) Pi(B) = f Mx) d^(x). 

We shall then say that fi{x) is a generalized probability density w.r. to n. 

A statistical hypothesis H specifies a subset w of SI, and states that the dis¬ 
tribution of X is some Pf with 0 t w. A test of H is any subset lo of 3S, the con¬ 
vention being that H is rejected if the observed value x of X is in w, and that 
in the contrary case H is accepted. The selection of w is to be made as follows. 
A number a is given, 0 < a < 1, the level of significance, and w must be such 
that 

(2.2) Ps{wi) = a for all 0 e w. 

Subject to this restriction it is desired to maximize Pe(w) for 0 in n — w. The 
interpretation of these conditions is immediate. Since Peiw) is the probability 
of rejecting H computed under the assumption that Ps is the distribution of 
X, equation (2.2) states that the probability of rejecting H is to bo a (usually 
some small number such as .01 or .05) whenever H is true. Similarly the second 
condition expresses the fact that H is to be rejected with high probability when 
0 is in — w, 

Naturally the second condition is not to be taken literally but rather as a 
loosely stated principle of choice. For in general there ivill exist a unique set 
w maximizing Pt^iw) for any given 0i e fl — w, but this lo will change with 0i, 
The condition has a clear meaning only in the case that the set f2 -- w contains 
only a single point, and in a few special problems in which the same set w maxi¬ 
mizes P)(w) for all 0 < fl — w. In the general ease there are available two main 
methods for making the condition precise. One may restrict consideration to 
some class of “nice” tests, so that within this class the maximization of Pi(w) 
can be achieved uniformly for 0 e Alternatively, instead of asking that 

a local optimum property hold uniformly, one may look for a test whose power 
function possesses some optimum property in the large. Both of these ap¬ 
proaches have an element of arbitrariness. In the first, the selection of a class 
of nice tests, in the second, the choice of an appropriate optimum property. 
Fortunately, in a number of important special cases, both methods, for ifarious 
reasonable definitions, lead to the same test. 

Before proceeding with this development, we shall modify the formulation 
of the problem slightly. First, as has been pointed out by many writers, it seems 
more natural to replace (2,2) by 

(2.3) P)(w) £ a for all 0 «( 0 . 

Secondly, we shall permit "randomized” tests (see [11, 29]), that is, instead of 
demanding that the statistician decide for each value of x whether to accept 
or to reject ff, we shall allow the possibility that for certain z the decision be 



4 


E. L- LEHMANN 


reached by means of some chance device such a-s a table of randtim numf>era. 
By a test of H we shall therefore mean a function 4 from 3£ to the interval 
[0, 1], with the convention that when x is the observed valiu' of X some r’liance 
experiment with two possible outcomes R, R will he porformod jsuch 
that P{R) = and that H will be rejected when the cuiteomc is R and wdll 
otherwise be accepted. The case of a non-randomixed test w clearly is obtained 
as a special case by taking for 4> the characteristic function of the wet w. 

For a test the probability of rejection is given by 

(2.4) j^4>(x)dPf(x) ^ E,4>iX) 

where Ej denotes expectation computed with respect to the pr»)bability dis¬ 
tribution Pf. We therefore obtain the following formulation of the problem: 
To determine a teat function (l> (Q ^ <l)(x) ^ 1) which maximizes P’j <#>(-¥), the 
power of <j> against the alternative 8, for S in ft — w subject to the condition 

(2-5) g a for all 8 tu. 

In this connection it is convenient to use the term "keel of signijicance" for 
the preassigned number a, and to define the ftizc of the test iji as 

(2-6) sup 

dew 

Except in the trivial case that there exists a test of size < a whowe power is 1 
against all alternatives, the size of any optimum test (in fact, of any ailmifwible 
test) equals the level of significance. 

3. Testing against a simple alternative. A complete solution of tlie problem 
formulated in the last section is available only in the cose that w and il — u 
each contains only a single point, that is, in the case that both the hyputhoais 
and the alternative are simple. The solution is then given by the fundamental 
lemma of Neyman and Pearson [18], which we may state in the fol¬ 
lowing slightly more complete form. 

Theorems.!, Let 


P>{A) = I Ms) d^{x). 

(a) For testing the hypothesis H-. 8 = $» against the altematwe 8 m at hod of 
significance a, there exists a number k and a test 4 , of size a such that 


(3.2) ^ > hf>,(x), 

= 0 when f,,{x) < kf,,{x). 

(b) 7/ Mx) and fi,{z) are 0 for all x in X, then a test 4 is mod powerful for 
Uding H mrnsU - 6 , if and only if U satisfies (3.2) except possibly on a set 
of M measure 0 . (Note that the number k of (3.2) is essentially unique) 


> Throughout the paper we shall consider two testa 
of M-measure 0. 


aa equal if they differ only on a set 



THEOHY OF TESTING HYPOTHESES 


5 


The second half of the theorem may be paraphrased by saying that under 
the conditions stated the most powerful test is uniquely determined by (3,2) 
except on the set on which 

(3.3) fi^ix) = 

On this set the value of (f> may be assigned arbitrarily provided the resulting 
test has size «. If in particular the set on which (3.3) holds has measure 0, the 
most powerful test is unique. 

It should be mentioned that (3.1) is no restriction since any two probability 
measures Pi, P 2 defined over a common additive class can be represented in 
this form with a = Pi + P 2 . If the assumption of (b) is not satisfied, the 
theorem is still true m essence but some trivial modifications are necessary. 

No such complete solution is available for the problem of testing a composite 
hypothesis against a simple alternative. However, as was shown in [11], this 
problem may in many cases be reduced to the one just considered. Let the 
hypothesis state that 6 is an element of < 0 , and consider the simple alternative 
6 = 61 . Suppose that an additive class of sets has been defined on <0 (in moat 
of the applications w 13 a subset of Euclidean space, and the additive class is 
formed by the Borel sets contained in w). Then for any probability distribution 
X over w, 

(3.4) h\(x) = f f,ix) dK{e) 

is a probability density function with respect to /i. 

Under certain conditions to be stated below, the most powerful test 4>\ for 
testing the simple hypothesis H\ that X is distributed with probability density 
h\ against the alternative /e, is also most powerful for testing the original hy¬ 
pothesis H against the same alternative. This is essentially the Bayes approach 
developed by Wald for his general decision theory, and in fact, under the con¬ 
ditions which we shall state, X is a least favorable distribution over w in the 
following sense. Let be the power of against fii , and for any distribution 
X* over u denote by Hi, , , /3x* the associated hypothesis, the most powerful 

test for testing it against /j,, and the power of this test respectively. Then X 
is said to be least favorable if for all X* 

(3.6) fix g 0K.. 

Theorem 3,2. Suppose there exists a probability distribution X over w such that 
the most powerful test tjjx of size a for testing Hx against /k, is of size a also with 
respect to the original hypothesis H. Then 

(i) 4 >x is most powerful for testing H against ; 

(ii) X ^s a least favorable distribution. 

Also, if (px is the unique most powerful test for testing Hx against /o, , it is the 
unique most powerful test for testing H against /b, . 

These results are essentially contained in Wald's work (see for example 
theorem 4.8 of [26]). 



6 


B. L. LEHMANN 


There are many trivial applications of this theorem, to finfling rawt powerful 
tests of one-sided hypotheses concerning a single real-vahunl panunetrr, mh-U 
as testing H:p ^ pn against p - pi(pi, < pi) when X haa a hitinmial dihtnSm- 
tion -ffith parameter p. As is well known, it turns out in a iniinbcr of ihiw < 
that the most powerful tests are in fact uniformly most powerful againsf tin* 
one-sided class of alternatives. 

In [11] Theorem 3.2 was used to determine most powerful rortain 

hypotheses concerning normal distributions. As an example eori-sider the crtwi 
that Xi, • ■ ■, X'n are independently normally distributed wUh roimmm rm*;m 
i and variance tr^. Denote by Hi and H,, the hypotheses j « 1 and i {( re¬ 
spectively, and let the alternative be: f « fi, o-* = tr?. Then the m<«l {lowerftil 
test of Hi rejects if 


^3 -S'fan- - til' < Bi when o-i < 1, 

Sfi, — f)* > Cl when o-i > 1, 

and accepts otherwise. Here h and c, depend only on the level of signilkumce, 
that is, are independent of ti, cri. If > 0, the most powerful teat for testing 
Hi rejects if 

S(a!, — i)* g when a < ^, 


(3.7)' 


^ cj when « ^ 


\/ 2 (®< - 

and accepts if, otherwise. Here hmda depend only on «, while h dopenda tin 
ii , cri and a. 

• indicate that even when the class of alternatives is larger than 

in the above problems, some improvement over the standard tete ro&y be 
possibk provided good power is desired only against a narrow rluas of alter- 

4. Sufficient statistics. Before treating the problem of composite altc-roativcH 
we shah consider an important simplification that can be obtained by makine 
use of sufficiMt statistics. This notion was introduced by R, A. Fisher and was 

m which * he», A sel in the range of I ie aaid to be raeMutab *if the mmmT 

Im .hown h“ for ™ ?.« r Tf‘“““O «'« «■ KolmogoroJ 

11) of B given T =^uniaiie]v oonditional probability 

1 M given 2 - t uniquely up to a set of measure zero by the equation 

(4.1) P(Bnr\A))^lH(Blt)dP^U) for all a.?(. 

Suppose now that we are given a claBH t nf .. 

8 = (Pfl, Denote by PdBUrtheVnJv ^ distributions for A', 
uy 1 1 ) the conditional probability of B given 



TIIEOEY OP TESTING HYPOTHEBM 


7 


T — i computetl for the distribution Pf , The statistic T is said to be a 
dent slalisHc for 5 (or for d) if for every P e there exists a deterraination fif 
P»iB 1 1) that is independent of 6. 

According to the above definition of statistic, l(x) is an element of a meas¬ 
urable partition. However, one may consider instead any function P for which 
l*{x) « if and only if f(x) = tix'), that is, any function that leatls to thin 
partition; the values that the function takes on are really immaterial. It will 
be convenient here to use this wider definition of statiatie. For a rigoroua treat¬ 
ment of some of the problems that will be referred to one needs to define an 
equivalence of statistics and to include in this definition the appropriate nullset 
considerations. A detailed account of these matters is given in [2] and [10]. 

From our present point of view tests are compared solely in terms of their 
power functions. On this basis two tests and 4>i may be considered equivalent 
if they have identical power, that is, if 

(4.2) EfftiiX) = for all fl «Q. 

We can then state 

Theohem 4.1. If T is a suffideni slatishc for 9 and <PiX) any test of a hypothe¬ 
sis concerning 9 then there exists an eguivaleni test that is a function of T only. 

The proof of this theorem is immediate since 

(4.3) HT) = Em 1 T] 
is such a test. 

It follows from Theorem 4.1 that we lose nothing by restricting considera¬ 
tion to tests based on a sufficient statistic.^ The problem of determining whether 
or not some statistic is sufficient for a given family of distributions is simplified 
through the u.se of a criterion for sufficiency that can be checked on sight. This 
criterion is due to Neyman [13] who proved it in a somewhat special setting, 
and was recently proved in a very general form by Halmos and Savage [2], 
It states that if = (psj, «SI is a family of generalized probability densities 
for X, then under certain mild restrictions a necessary and sufficient condition 
for T = l(X) to be a sufficient statistic for g la that pi{x) factors into one fac¬ 
tor depending on 9 but on x only through i(x) and a second factor depending 
only on x. 

The question arises os to which of various sufficient statistics to use. Since 
the purpose of introducing sufficient statistics is to reduce the complexity of a 
given statistical problem, one is led to seek a sufficient statistic that reduces 
the problem as far as possible and hence to the notion of a minimal suffideni 
statistic, a sufficient statistic T being minimal if it is a function of every other 
sufficient statistic (see [10]). It can be shown under fairly general conditions 
that a minimal sufficient statistic exists, and one can give an explicit construc¬ 
tion for it. 


' A justification for the use of auffioient statistioa in the general statistical decision prob¬ 
lem was given in [2]. 



s 


E. U LEHMANN 


As one would expect it turns out that the Hufricicut stal intjch e.tmnuunly 
associated with various families of distributions are actually minimal, llm?* for 
example, if Xi, • ■ , are independently normally distrilmtcd rtumn-m 
mean ? and variance a , the statistic {X, S(Xi - 1)’} in a roinima! fuffiru-nf 
statistic for 6 = {^, Xi, ■ • • , X„ ore independently uniformly 
over (0, e), max(Xi, , X,) is the minimal sufficient statistir for f?. If s'! w 
the family of distributions according to which Xi, " ■ , A", are identicUlv sn- 
depeudently distributed according to an arbitrary univariate dif-trihufu-n lor 
according to an arbitrary probability density with respect to a fixed univitnate 
measure), then the minimal sufficient statistic is obtained by delininj; fr«r e.ieh 
point a: = , ■ • ■ , ain) the set d(x) as the set of points ftblaiiusble frtim t by 

permutation of coordinates. Alternatively one can define it by £(xs, ■ ■ • , ^ 

(Sr., Sx?, • ■ •, SxD. 

6. The principle oJf invariance. The notion of invariance ■was introducer! info 
the statistical literature in the writings of 11, A. Fisher, Hotelling:, Fitnnm }*d(l| 
and others, in connection with various special problems. A getu'r.d fortmiln- 
tion was given by Hunt and Stein who, in an unpublishwi paper fr>|, utih? 4 *d 
this notion to find most stringent tests, and who obtained I he exfonph*;? r>f uni¬ 
formly most powerful invariant tests that will be given btlow. 'rim jmint of 
view in the present section is different from theirs liowe.vi'r, .■'ince liere invariance 
will only be considered as an intuitively appealing restriction that one ituiy 
wish to impose on statistical testa. 

We shall begin by considering an example. tSuppose it wore kiitmu tlml the 
height of people is distributed about a known mean, winch for ronvcnioni’c wo 
shall take to be zero, either according to a normal or to a Ciuichy dwtribution, 
with unknown scale factor so that either 



or 


(5.2) 


feix) 


1 


jre^ + 


0 < ^ < DP, 


Suppose wc Wish to test from a sample Xi, • • ■ , X„ the hypothcKis // that the 
true probability density belongs to the first of these claEses against, the* altcnia- 
tive that it belongs to the second. Then it seems desirablR that (he deeiftioti uf 
whether or not to accept H should be independent of the scale luloptitl for 
measuring the heights. For otherwise one worker expressing his data in foot 
m^t reject H while another worker using the same data but.exprming timra 

mi A "f connection wr for cxamplo 

is 1 f’ function <t> therefore would be indojmndcnt of the 

choice of scale, i.e., it would satisfy the condition 


(5.3) 4>icxi , • • ■ , cx„) _ 0 (a;i, -. , j-j for all c > 0 and for all (xi, • • ■ ,xj 
except possibly on a set X, independent of c and of measure zero, 



THEORY OF TESTING HYPOTHESES 


9 


On analyzing this problem one is led to the following observation. Multiply¬ 
ing each of the random variables Xj, ■ • ■ , by the same constant leaves 
both oj and fl — w invariant, i.e., if the X’s are normally distributed with zero 
moan and arbitrary scale so arc cXi, • • • , cXn , and analogously for the Cauchy 
distributions. It is this fact that makes it so desirable to have ^ invariant under 
multiplication of the x’s by a common constant. 

More generally consider measurable 1:1 transformations j? of 3E into itself, 
and let Y =• gX. Suppose that when X is distributed according to 1?«w, Y is 
distributed according to 6't u — we shall then write 6' = gB — and that as B 
ranges over w so does B'. Suppose that the analogous condition is satisfied for 
£2 — w, BO that the problem of testing u against £2 — cu is left invariant under g. 
Now whether one expresses the observations in terms of X or in terms of Y is 
essentially a matter of choice of coordinates. The principle of invariance asks 
that if such a change of coordinates leaves the problem invariant, then it should 
also leave the test invariant, i.e., if (? is a group of measurable 1:1 transforma¬ 
tions of such that 

(5.4) gu = oi and g(Q ~ w) = £2 — w for all ^ e G, 
then (ji should satisfy the condition 

(5.5) 4>[gx) = tt>{x) for all g tO, 

and for all x except on a set iV independent of g and such that niX) = 0. If this 
condition were not satisfied, two workers, using the same data but expressing 
them in different coordinate systems might arrive at contrary conclu.sions. 

As an example consider the general linear univariate hypothesis. In canonical 
form Xi , • ■ ■ , X, ; Xr+i, • • ■ , X, ; X, i-i, ■ • ■ , Xn arc independently normally 
distributed with common variance. The means of the first s variables are un¬ 
known, the means of the last n-3 variables are known to be zero. The hypothesis 
states that the first r means are zero. Adding arbitrary constants to each of the 
variables of the middle group leaves w and £2 — to invariant. So does any orthogo¬ 
nal transformation of the first r variables, and any orthogonal transformation of 
the last n ~ s variables. Finally, the problem is also left invariant when all of 
the variables are multiplied by the same constant. It is easy to see that a 
function 0 is invariant under these transformations if and only if it is a func¬ 
tion of 



But, os is well known and easy to show, among all tests based on this statistic 
there is a uniformly most pow'crful one, namely the test that rejects // when 

(-1 / <-•+! 

is too large. Therefore, among all tests satisfying the condition of invariance 
the standard test is uniformly most powerful. 



10 


E. L. LEHMANN 


To formulate a correepondiug reductioa prmHitm* in gj'ticr.il, ft*- dfS.tH* a 
function ft on J to be maximal invariant (uiulrr G*! if it j;t in.arisriJ 
and if h{x') = h(x) implies the existence of gtCf siirh that x' ^ §i. 'I h*'!! a 

function on it is invariant under G if and only if it dependi^ nn x *mly Ibr'njfih 
h{x), that is, if there exists a function yt such that vsfj-) -- a 

sary and sufficient condition for a t(»t to be itivaritinf umhT G h that it bf 
based on the statistic Y == h(X). The principle of inviirianc** tltcn fon* 
the problem from X to K = h(X). To determine the if^ultinK s4atj!*tsrftl r»* 
duction, that is, the simplification of the parameter spar**, utu- may 
the group G of transformations over 0 induced by G. If r’W| j;'i a iiiaxinia! lU ' 
variant function under G, it is easily shown that the diatriliufjou nf F flependii 
only on v(0). Hence under the principle of invariance any tw** d-viduf-s mith 
common v(0) (that is, such that each can be obtainwl from the other by a triou*- 
formation of G) are identified. If in particular e(5) is constant over w* tlm hy¬ 
pothesis H, when expressed for Y, becomes simple, and there may evr*n r-sist 
a uniformly most powerful invariant teat. 


Besides for the example already mentioned this is the ew fur Hr4*.fling‘s 
^-problem and for the hypothesis specifying the value of a multiple eorrehi- 
tion coefficient, Another example is obtained when ATi, • > • , Xn hit indetwud- 
ently identically distributed, each with probability density ivhere under 
Hi pj(x) = f,(x ~ 0), (i = 0,1), and where it is desirwl to te-st I/f, ngiiinnt //|, 
One may also in this example replace the location parameter by a wale param¬ 
eter or have both parameters present. 


It may be worth noting that the likelihood ratio test is invariant under rmy 
transformation leaving the statistical problem invariant. In the prohlmnrt mn- 
ceming normal distributions mentioned above, when there exista a unihirmly 
most powerful invariant test, it coincides with the likelihood ratio teat, 'fhat 
tos 18 not so m general can be seen from Stem’s example given in wvtion 1. 
There the problem remams invariant under multiplication of A" by -I, and 

iahoMT invariant teat. However, the Itkelihooil 

ratio test is mstead uniformly least powerful. 

I "'I'*) " «’<*) fo' •>' » 

- - - 

hypotheses one can find a group of tmm- 

under this group there exists a invariant 

be raised whether this anurofteh ' Y powerful one, The quwtion may 
other group of iKime 

to a different test. Also in problems Problem invariant but leading 

Also m problems where among aU invariant tests there dim 



TRKORY OF TEBTINO HYPOTHESES 


11 


not exist a uniformly most powerful one, the question arises whether one is 
using the totality of transformations leaving the problem invariant, or whether 
perhaps one can reduce the problem further. It therefore seems of interest to 
determine the totality of transformations leaving a given problem invariant. 
This was carried out for a few simple problems in [8]. 

We finally mention a connection between the notions of invariance and suffi¬ 
ciency. Consider any problem in which the variables ifi, ■ • ■ , Xw are inde¬ 
pendently identically distributed under all distributions of Q. Such a problem 
clearly is left invariant under any permutation of the variables. Actually, these 
transformations leave not only w and fl — w invariant but each point of SI in¬ 
dividually. No essential reduction of the problem is obtained since the maximal 
invariant statistic is a sufficient statistic! It is easily seen that this will always 
be the case when the transformations leave SI pointwise invariant, but that in 
this way one does not obtain all sufficient statistics. These can be obtained, 
however, by considering more general transformations, where each point x of 
3£ is transformed into the points of X according to a probability distribution P, . 


6. The principle of unbiasedness. As a second principle of reduction we shall 
consider the principle of unbiasedness proposed by Neyman and Pearson. A 
test is said to be unbiased [19] if 

P« (rejecting H) > a for all 5 «il — w. 

This seems a desirable property for a test to have since it assures that there do 
not exist in « and fli in SI — «, for which 

Pjo (rejecting H) > Pj, (rejecting ff). 

We shall therefore be concerned in this section with the totality of tests 4> for 
which ' 


( 6 . 1 ) 


Eeif){X) < a for all Oeu 

^ a for all 6 eU — <a. 


For a number of important special cases there exists, among all testa satisfying 

(6.1) , one that is uniformly most powerful in S2 — w and uniformly least power¬ 
ful m w. (The latter property is of course very desirable since when H is true 
one wants to reject it as rarely as possible.) This follows immediately from well 
known results concerning best similar teats since for the problems in question 
12 is a subset of a Euclidean space and for any test 4>, P<k^(X) is a continuous 
function of 6. If then A is the set of points that are boundary points both of 
w and of Q — £ 0 , it follows from (G.l) that 

(6.2) P^(X) = aforall OeA, 

i.e., that 0 is similar for 6 in A. But if among all tests satisfying (6.2) there 
exists one that is uniformly most powerful in 12 — u and uniformly least power- 



12 


B, L. LBMMA-NS 


ful m u, it automatically satisfies (6.1) as is .wn I>j’ rnm|!!tri««(i wsfh tlo' 

<I,{X) ^ a. 

As an example suppose that Ai, • • - , A", are indepen'iciitiy ini*<nua1l,v dis¬ 
tributed with common mean f and common variance . If the hyjuitfuwiii is 
Hi. <T < 1 and the alternatives are cr > 1, the set A (>e(n>mw tin* liiir a t. 
As was shown by Neyman and Pearson [18], among al! tcints salif-fying 
with this A, the test that rejects Hi when S(x, — f)* < k fwlir-ro k is sa af ipro- 
pnately chosen constant) is uniformly most powerful for 0 in U - «, find uni¬ 
formly least powerful for d in w. 

If instead we consider testing the hypothesis Ih i a I against the idlenm- 
tives (7 ?^1, we find that A - w, and our problem reduces to that of finding the 
best test among all those that are similar in w and unbiased, As is well known, 
it turns out that rejecting when S(x, - x)’ < h and when > kj 

(where h < h are two appropriately chosen constants) is uniformly numt 
powerful among all similar unbiased testa. 


(t 


A third hypothesis concerning a that might be of inlerwt is //j ; trj 
0-2 . Here A consists of the two lines c = o-i and o- tfj and it, i,s to show 
that the test that is uniformly most powerful in fl ~ w and uniformly least power¬ 
ful in u rejects Hi if and only if 2(x, - x)’' < Ci or 2:(x, - Jt)* > rj where again 
Cl < C 2 are two appropriately selected constants. 

The question arises as to the connection of the primniilpn of invariance and 
unbiasedness. Clearly if there exists a unique test that ia uniformly most 
powerful unbiased, this test is invariant under any group f/ leaving the problem 
invariant. If then in addition there exists a uniformly most powerful invariant 
(under G) test, this must coincide with <f>. Tims, if Imlh prineiplea lead fo a 
unique optimum solution, these solutions coincide. 

We have seen that frequently optimum unbiased tests can he uhtamwl 
through a study of tests that are similar over certain sets in the parameter 
space. The totality of similar tests was obtained for a numlwr of imjiurtant 
probkms by Neyman and Pearson. In his 1937 paper on confidence intervale 
beliefgave a general method for constructing similar regitirm with the 

Sflrfor • f ^ ® The condi¬ 

tion for <!> to be similar with respect to A and of size a, is that 

(6.3) 


i.e., that 


mX) = EiE[4.{X) 1 T] = a for aU 9 eA, 


a) - 0 for all 9«A. 


(6-4) Ei\Em I T] 

Clearly any test ^ for which 

•®[<^(A) I i] = «for almost all i 



THEOKY OF TERTING HYPOTHEHBR 


n 


analytic problem the aolution of which is known in many aijccial ciw™ This 
method was first employed by P. L. Hsu |3] for some problems concerning 
normal distributions, and was extended to other cases in [7]. The present gen¬ 
eral formulation was given by H. Scheff 6 and the author in [ 0 ] and [10], We 
shall say that a family of diatributions {PDi 0 «A, is boundedly complete if 

(i) fit) is bounded, 

(ii) EifiT) = 0 for all fl eA 

imply f(l) = 0 except on a set N with PtiN) - 0 for all (7 e A. Then we, can 
state 

Theorem 6.1. A necessary and sufficient condition for the totality of tests simi~ 
lar for A to have Neyman siruclure with respect to a sufficicnl statistic T is that 
{Ps}, 6 eA, he houndedly complete. 

7. Tests whose power increases with the distance from the hypothesis. 
Frequently, even among the unbiased tests, there does not exist a uniformly 
most powerful one. The general univariate linear hypothesi.s with more than 
one constraint is an example of this situation. The following extension of the 
idea of unbiasedness may then be used to reduce the class of tests still further. 
Unbiasedness distinguishes between values of 6 as they belong to w or n — w 
However, one may further classify the points of — w according to their "dis¬ 
tance” from to, and then ask of a test <p that tho further bo 0 from w the larger 
be the power 

One possible such ordering of the alternatives is that induced by the envelope 
power function. Here the envelope power at d (Wald [24]) is defined by 

(71) pt{B) = sup p^(0) 

where ^(q:) is the class of all tests with EvptfK) < a for all 0 c w. Of two points 
di , 02 one may then .say that di is closer to w than O 2 , equally close or less close, 
as/?* ( 61 ) is less than, equal to or greater than ptidi). The distance of 0 from w 
is thus measured by the ease with which one can detect that the hypothesis is 
false when O is the true parameter value. 

When 6 lies in a Euclidean space and p^{0) is a continuous function of 9 for 
all 0; as is tho case in most applications, the condition that the power increase 
with /?« will usually imply that ^,,(00 == / 3 ,,( 02 ) whenever ^^(00 = d«(^a)- In 
the case of tho general linear hypothesis considered in section 5, for example, 

t 

one would obtain the condition that the power be a function only of S 

i-V 

whore f, = E{X{). As was shown by P. L. IIsu [3], the standard (likelihood 
ratio) test is uniformly most powerful among all tests satisfying this condition. 
Analogous remarks apply to Hotelling's problem, and to the hypothesis 
specifying the value of the multiple correlation coefficient. The corresponding 
optimum properties in these cases were proved by Simaika [ 21 ], 

It is interesting to compare the above condition with that of invariance. 



14 


E, L. LEHMANN 


This comparison yields nothing of interest if the totality of tots m ronsidered. 
We may, however, restrict our attention to tc*t8 depending only on a Mdririetit 
statistic T. We already know that ^(X) and j 7'J have ideiitic.d power. 

In order to validate the comparison we wish to make, we state tin’ follow mg 
Lemma. Let T he a sufficient statistic for 0 e SI, and let G hr a ijrnup «/ /:/ 
transformations g on X leaving SI invariant. Then if ts inmrinnt undt r ff, 
E[tp{X) 1 1] is almost invariant under 0. 

We can now state the desired comparison in the folk)^ving 
Thboubm 7,1. Let G be a group of 1:1 transformations on X, IH f? hr (hr in. 
duced group of transformations on Si, let v{0) be maximal invarttirtl mulrr <?, umi 
suppose that G leaves ca and 0 — « invariant. Suppose further that T it a avJj. 
dent statistic for SI, and that (Pf J, 0 e Si, is boundedly complete. Then a nteenhitrif 
and sufficient condition that the power of a test frlT) be a futwlion only of lidj, is 
that fp{t) be almost invariant under G. 

This theorem ia an immediate extension of some resultH of Wolftjwit* 
Theorem 7.1 together with the results of section 5 proves that the standarti 
tests of the general linear hypothesis, Hotelling's T’-prohlem and the hypnthc. 
sis concerning the multiple correlation coefficient poasess the optimum property 
that was obtained for these problems by Hsu and Simaika, respectively. The 
method of proof indicated here is due to Wolfowitz [35]. 


8 . Most stringent tests. Wo shall now turn to the third aspect of the theory; 
Optimum properties defined with reference to the whole class of tdtenialivM, 
and attainable with no restrictions imposed on the class of tests. In the present 
section we shall consider the property of stringency. Wald [25] delim* a test ^ 
to be most stringent if it minimizes 


( 8 . 1 ) 


sup [dJe^) - /3^(9)], 


• fO—to 


where again denotes the envelope power, and p, the power of The rationale 
of this definition is clear. The difference a 

A theory of most stringent tests wan developed by Hunt and Stein 151 who 

formations. t eir results m connection with the following groups of trans< 

fS a real variable; 

(n) gx = ax, 0 <, a, X a real variable; 

(V) *-«:ns7^ 



THEORY OF TESTING HYPOTHESES 


15 


Theorem 8.1. (Hunt and Stein). Ij G is the direct product of a finite number 
of groups of types (i)-(v), and if 0 leaves the problem invariant, that is, if G satis¬ 
fies (5.4), then there exists a most stringent test invariant under G. 

Actually, it is not necessary here to require that G be a direct product. The 
result holds also if the factoring of G is according to normal subgroups, where 
the normal subgroup at each stage and the final factor group are of the types 
mentioned. In the light of this one may omit t 3 fpe (iii) from the list since it has 
a normal subgroup of type (i) with factor group of typo (ii). 

The proof of Theorem 8.1 is based on the following lemma, which has appli¬ 
cations to many related problems. 

Lemma (Hunt and Stein). If 0 is a direct product of a finitenumber of groups 
of types (i)-(v) then given any function f over 3E (0 < fix) < 1) there exists a func¬ 
tion F {0 < Fix) < 1) such that F is invariant under G, and 

(8.2) inf [ figx)(p(x) duix) < f Fix))p{x) dnix) < sup f f{gx)i/>ix) dpix) 

fliO J J pi(7 J 

for all (fi that are inlegrahle p. 

It follows from Theorem 8.1 that if there exists a uniformly most powerful 
invariant test, this test is most stringent. In this way Hunt and Stein show, 
for example, (see in this connection section 5), that the likelihood ratio test of 
the general univariate linear hjrpothesis is most stringent, A question that is 
left open is the uniqueness of such a most stringent test. 

In general, the possibility therefore remains that there might exist another 
most stringent test uniformly more powerful than the invariant one. In certain 
particular cases this possibility can be ruled out by the following considera¬ 
tion. Suppose that is a subset of a Euclidean space and that every point of 
£i) is a limit point of ft — w. Suppose further that for any test <t), EfhiX) is con¬ 
tinuous in 6. Then clearly, if <f>i is similar of size a for testing u and 4>i is of size 
g a but not similar, <^2 can not be uniformly as powerful as <^i. Hence any test 
that is admissible among all similar tests of size a is also admissible among the 
totality of tests of size g a. Now admissibility among all similar tests is some¬ 
times not too difficult to prove. For the likelihood ratio test of the general 
linear univariate hypothesis, for example, it is an immediate consequence of 
the properties of this test proved by Wald [23] and Hsu [4]. 

The following alternative method for obtaining moat stringent tests is also 
mentioned by Hunt and Stein. 

Theorem 8.2. (Hunt and Stein). Let il — u be partitioned into disjoint sub¬ 
sets ftj such that is constant on each ftj, and let <pt be the test that maximizes 
inf pifiiO). Then if tpt ™ tp is independent of S, <p is most stringent. 

This result may be supplemented by the following method for finding teats 
that maximize inf /9,,(0) over a given set of alternatives wi (not necessarily 

d«Ml 

satisfying the conditions imposed above on the ftj’s). 



16 


E. h. LEHMANN 


and consider prohabiUly measures X onri Xx ot-rr .u and c. . Id -hr fum /« x) 
he generalized probability densities with respect to p, so that kU) -- £ 

and ^^ 1 ( 0 ;) = / /«(^) dXx(e) o« pra&ait7% densities with rf.^rt ^ 

be the most powerful test of size a for testing the simple hyjwlhem ^ 
simple alternative h , and suppose that the power of y> agatnst h, is /3. Ihm if 

E)<p(x) < a for all Oeu, 

Et(p{x) > /3 foraU Oeui, 


ip maximizes inf ;S^(^) at level of significance a. 

This metS, when applicable, has the advantage of giving the tolaUty of 
moat stringent tests (see in this connection Tlicornin 3.1) and limi't* of holtling 
the question of admissibility. However, in many applicaUoii.'i iiroliahihty imucH- 
ures X, Xi with the desired properties do not exist but instenwl only jncncf'a 
X^"’, Xl^'*, which satisfy the conditions in the limit. In thi.s case again only lim 
weak conclusion is possible; The test obtained is most stringent but has not 
been proved admissible. (For an example in which tlie unuiogoua mr‘lh»d Ima 
been carried through in detail for an estimation problem, see [2i)b 
Actually, the two methods are closely related, us ran Im seen from the {iroof 
of the main lemma, In those cases in which there exists a groiiii (i giving the 
maximum possible reduction, the group Q induces a partition tif ft {tlirough the 
equivalence: ^ if there exists ^ such that 6% == ^Si), juKt into w lUtd the 

sets ftj. (This is so mainly because, as was shown by Hunt and iStein, tlio tm- 
velope power remains invariant under any transformations that leave the prob¬ 
lem invariant.) Then the measures X, Xj over w, fta respectively, whieli figure in 
the application of Theorems 8.2 and 8.3, become invariant measures over C? 
through the obvious 1:1 mapping from oj and the fti’s respectively In fr. Thus 
the second method will allow the strong conclusion when the group (r involved 
in the first method possesses a finite invariant measure [types (iv) and (v)j but 
not if any of its factors are of type (i)-(iu). 

To conclude this section we shall give an example where the mettuitl of in¬ 
variance leads only to a partial reduction but where the solution rnay be com¬ 
pleted by certain additional considerations. Suppose that {Xi , • ■ > , A'w) is a 
sample from a normal distribution with mean $ and varianoe cr*, both unknown, 
and that we wish to find the most stringent test of the hypothemis //: tr 1 
against the alternatives a ^ 1. Theorem 8.1 reduces the problem to the sta¬ 
tistic 7 = 2(Z. - Xf , but among the tests of H based on this statistic there 
does not exist a uniformly most powerful one. It may also be shown [81 that 
no further reduction is possible by means of the method of invariance. 

However, one may now consider the problem of finding the most stringent 
test based on 7. (The envelope power function |3*(^, a) that must be used 



THEORY OF TESTING HYPOTHESES 


17 


naturally is not the one for Y but that for the original problem.) From an argu¬ 
ment given in [6] it follows that this test is of the form 

<PkiM‘ reject when Y < fci or > h , 

where ki, fcj are determined by the two conditions 

(i) P(rejection j «r = 1) =< a, 

(ii) sup <r) - = ^jup <r) - *,(o-)]. 

Hero a-) is independent of f and can bo obtained from a table of the x*- 
distribution (with n degrees of freedom for ff < 1 and n-1 degrees of freedom 
for <r > 1 as can be seen from (3.6)). Hence fci and fcj can be computed fairly 
easily. 

Another problem that may be treated in this way is the hypothesis of equality 
of variances for two normal samples. If the two samples are of equal size, there 
exists a uniformly most powerful invariant test for a suitable group of trans¬ 
formations. However, if the sample sizes are different the method of invariance 
reduces the problem only to 2(X,- — / 2(F,- — Yf , and the cut off points 

giving the most stringent test may be determined by an argument analogous 
to that given above. 

This method may bo extended to allow determination of most stringent test 
of hypotheses such as //: <ri ^ <r g o-j This requires a certain modification 
of Theorem 1 of [6], which is easily obtained. One finds agains that one may 
restrict consideration to a one-parameter family of tests (determined by a 
somewhat different condition than above), and that among these tlie most 
stringent test is obtained by the analogue of condition (ii) above. 

If should be mentioned that the results of [6] apply also to the hypothesis 
specifying the value of the parameter in a binomial or Poisson distribution. 
This is easily seen since in either case the distributions of 0 are absolutely con¬ 
tinuous with respect to a common sigma finite measure and since for the ap¬ 
propriate choice of this measure the generalised density is of the form assumed 
for the density in [6]. Hence in both the binomial and the Poisson case the most 
stringent test is determined by conditions analogous to (i) and (li) above. 

9. Tests that minimize the maximum loss. In the Neyman-Pearson theory 
one classifies the errors into two kinds: Rejecting the hypothesis when it is 
true, accepting it when it is false. One may however analyze the situation further 
and distinguish, say, between accepting when one or some other alternative is 
true. Thus one is led to introduce the losses that result in a given situation from 
the various possible errors, and to look for a test that, in an appropriate senso, 
minimizes the expected loss. This possibility was mentioned by Neyman and 
Pearson [17], and was taken as the starting point of his general theory by Wald 
(see for example [24]). 

In order to stay within the framework of this exposition we shall here in¬ 
troduce losses only for the errors of accepting the hypothesis when it is false. 



18 


B. L. LEHMANN 


while still demanding that the probability of rejection when the hyputhepifs ia 
true should not exceed a. Actually, there are many where thw m-tm to 
be a reasonable formulation. For it frequently happens that the. two types of 
error entail consequences of such completely different nature that the ref^ilting 
losses cannot be measured on a common scale while usually the different errora 
of the same type are comparable. 

We shall therefore assume that for each S «0 — u there is defined a 1171), 
which measures the loss resulting from acceptance of // when 0 i« true, Hie 
risk which one runs by using a test <p, when ffeU — <j m the true parameter 
value is given by the expected loss = W(tf) JSjjl - »>(X)j. Wlien a uni¬ 
formly most powerful test exists for the hypothesis in question, this test alto 
minimizes the expected loss uniformly for 9 in 0 — w. In the contraw case one 
may again restrict the class of tests in some way, so that witliin the restricted 
class there exists a uniformly most powerful test, and hence a teat that uni¬ 
formly minimizes the expected loss. .Wternatively we may again consider some 
optimum property of the risk function R^CO) as a whole. We shall here consider 
the minimax principle introduced by Wald, and seek a test, which, Buhject to 
^ a for all « u, minimizes 


sup litll 
ua-u 




the maximum risk. 

If one introduces losses also for the other type of error it is easy to see that 
for a suitably chosen loss function the definition of minimax expeetrxi ](«is coin¬ 
cides with that of stringency. It is therefore not surprising that the method* 
of the previous section can be extended to cover the problems comtidcred in 
the present one. (They are actually much more general, and may be applied 

^so, for example, to the problem of point estimation, and in fact to the general 
decision problem). 

Horn the lemma of Hunt and Stein stated in the previous section w'c im¬ 
mediately obtain the followmg extension of Theorem 8.1. 

? transformations having the hypoH^ins and 
f invariant, if 0 can be factored by normal subgroups into 

fetors of types (i)-(v), and if the loss function Wid) is invananl urvLr Q Qm 
there exists a test <p invariant under G and minimizing ' 


(9.1) 


sup W(e) 

• 4Q—« 


■ vm 


f uniformly moat powerful invariant test exists this 

W(«) is constant. ’ sp^aces the sets Jli by seta over which 



THEORY OE TESTING HYPOTHESES 


19 


Again it may happen that the method of invariance does not reduce the prob¬ 
lem suffi(!iently far but that the solution may be completed by other considera¬ 
tions. Let us once more consider the hypothesis H:it = 1 of the previous section, 
and let us suppose that the loss function has the necessary invariance property, 
so that it is a function only of o- but not of the unknown mean. It follows from 
Theorem 9.1 that there exists a test minimizing the maximum risk, which is a 
function only of K ~ S(X{ ~ X)’. From [C] it is easily seen that a test 
which rejects when Y < h or > h, has the desired property if its sijse is a andl 
if in addition 

(9.2) sup W(<r)E, [1 - y(K)] = sup W(.t)E, [1 - ,^(7)]. 

«<i »>i 

It follows that depending on the choice of W(a) the solution may be any member 
of the one-parameter family of tests ipktM of size a. 

Under the conditions of Theorem 9.1, when a uniformly most powerful in¬ 
variant test exists, this also maximizes the average power for a large class of 
weight functions. If there exists a common finite invariant measure over the 
sets Oj in the sense indicated in section 8, the uniformly most powerful invariant 
test will maximize the average power with this measure as weight function, over 
Qi for all 5. It follows that it maximizes the average power over 11 — w with 
respect to any weight function for which the conditional distribution over each 
llj is the above invariant measure. If the invariant measure over the llj’s is not 
finite one can obtain analogous results with respect to a sequence of weight func¬ 
tions invariant in the limit. The results indicated here are much weaker than 
those obtained for the general linear univariate hypothesis by Wald [23] and 
Hsu [4] under the restriction to similar regions. However their results are no 
longer valid when this restriction is omitted. 

10. Applications to sequential analysis. So far we have restricted considera¬ 
tion to the case that the hypothesis is to be tested on the basis of a preassigned 
experiment. However, frequently there is available for this purpose a large class 
of experiments, and the selection of an optimum experiment out of this class is 
part of the problem. We shall consider here only the following situation, which 
has recently been studied extensively (see Wald [28, 29]). There is given a se¬ 
quence of random variables Xi, Xi, • • ■ whose joint distribution is known to 
belong to some family = (P«), d e ft; the hypothesis specifics some subfamily: 
0 eu. The X’s are observed one by one, and the decision, whether or not to con¬ 
tinue experimentation at any given stage, is allowed to depend on the observa¬ 
tions taken up to that point, Thus the number n of observations that will be 
taken is a random variable Avhoso distribution depends on 0. Usually, by an 
appropriate choice of stopping rule, there may bo effected a considerable saving 
in the expectation of the number of observations necessary to achieve a given 
discrimination between hypothesis and alternatives. The problem is to deter¬ 
mine the stopping rule and test that minimizes this expectation. 

As we have seen in the previous sections the principal methods for obtaining 



20 


K. 1 , LEHMANN 


optimum tests consist m reducing the problem to that of tf'Sting a sinijili' liy- 
pothesis against a simple alternative This basic problem was solved in tho non¬ 
sequential case by Neyman and Pearson (Theorem 3.1). The solution of the 
much more difficult corresponding sequential problem was obtained fur a large 
class of cases by Wald and Wolfowitz [31] in the following 
Thboeem 10 1. Lei Xi, ■ be identically and indrprndcnlly dintributnl. 

It is desired to test the hypothesis that the common pwhalnliiy density of the X's is 
f{x) against the alternative that it is g(x). Given two numbers 0 < a < fl < I, there 
exists a test which, subject to the condition 


( 10 . 1 ) 


P (rejection | /) < a 
P (rejection \g) > 

minimizes simultaneously Ej{n) and Eo{n), the expected number of ohsermtions 
computed for the distributions f and g. This test is given in terms of two numbers 
A and B by the following rule. After m observations have been taken, 


reject if 


gixi) ■ • • g(x„) 


accept if 


f(^i) • 

gixi) 


• fM 
gi^m) 


> A, 


fixi) • • • f ( Xm ) 


<B. 


take another observation if B < - 


/(*i) 


< A 

■fM 


Here A and B are determined so that condition (10.1) holds vdth the inequaUiy 
signs replaced by equality. 

So as to be able to treat the various problems considered non-8ec[U(;ntiaIly in 
the previous sections one would have to extend this theorem at least to the ease 
that the vmables Xj, Xj, • •. form a set of equivalent variables in the sense 
01 de ifinetti [Ij. Instead, we shall here restrict ourselves to a fewproblema that 
can be solved on the basis of Theorem 10.1. All of the tests discussed below were 
hv points of view and some of their properties were discussed 

y Girshick in his important Contributions to the theory of sequential analy- 
sis Annals of Math. StaL, vol. 17 (1946) pp. 123-143 and 282-298, and by Wald 
m his basic book on the subject [28], 

ypothesis testing. Let the parameter space n bo divided into three bcIh the 

** of nHcniativca u,, anti a rogion ot 

(X, ■ • • X 1 bv y TL ’ denote the sequential random varialilo 

to » KCiuontial to.t which, 


Ef<p(X) < a for 6 eua 
P«5(X) > /3 for 0 e coj j 


( 10 . 2 ) 



THEORY OF TESTING HYPOTHESES 


21 


minimizes sup Ee(n). (Actually, this is a rather artificial formulation. The 
natural requirement is the minimization of sup Ee(n) but this is a much more 

f<D 

difficult problem.) The reduction to the problem of testing a simple hypothesis 
against a simple alternative is achieved by the following obvious extension of 
Theorem 8.3. 

Theorem 10.2. Let Xo, Xi be distrilnitions over wo , respectively, and let ys he 
a test, which subject to 


(10.3) 


f Ei,p(X)dMd) < a 
f EMdXi(0)>^, 

Jui 


minimizes sup 

i.10,11 


/ 


Ee(n) dXi((9). Then if <p satisfies (10.2) and 


(10.4) 


Ee(n) < sup j E»(n) d\i(d) for all 5 « aio + wi, 


<p minimizes sup E)(n) subject to (10.2). 

b) 1 + 


As in section 3 we can make certain trivial applications to problems concerning 
a single real parameter such as testing the hypothesis Hip < po against the 
alternatives p > Pi (po < Pi), where p is the probability of success in a binomial 
sequence of trials. In this example condition (10.2) of Theorem 10.2 obviously 
is satisfied when Xo and Xi assign probability 1 to po and pi respectively. Hence 
the probability ratio test for testing p = po against p — pi has the desired prop¬ 
erties, whenever (10.4) holds, that is, whenever Ep{n) attains its maximum 
between po and pi. 

The following is another example that may be solved in this manner. Let 
Xi, X^, ■ ■ ■ ] Yi, Fa, ■ ■ • be independently normally distributed, all with 
unit variance and means E{X,) = $, E(Y,) = rj. In order to test the hypothesis 
H:^> n against the alternatives — ? > 5 where 5 > 0 is given, a pair {Xi, Fi) 
is observed. If after this observation experimentation continues another pair 
(As, F 2 ) is observed, etc. In this case we may take for Xo, Xi the distributions 


that assign probability 1 to the parameter points (f, rj) = (0, 0) and{ —- ) 

\ 2 2 / 

respectively. Then the probability ratio after m observations is given by 


(10 5) 


exp 




e 





6 


Since the distribution of F - A depends only on 77 - f, it is easily seen that 
condition (10.2) is satisfied. 

Some further results can be obtained through extension to the sequential case 
of Theorems 8.1 and 9.1. 



22 


B. h, LEHMA.NN 


Theorem 10.3. Suppose that 0 is of the type described in Theorem 9.1, let Y =■ 
f(Xi , Xa, ■ ■ ■ ) be maximal invariant under G, let v{6) he maximal inmrinnl 
under G, and let the set of values of v(0) corresponding to wo and on bei^ and wi, re- 
spectively. If among all tests of wo dgainst wi based on K, the test w minimizes sup 

Bj(n) subject to' 

Ee<p{Y) ^ if «(fl)ewo 

( 10 . 6 ) 

Ea<piY) > /3 if , 


then <p also minimizes sup Et{n) among all tests based on the X‘s atul which ralisfy 

( 10 . 2 ). 

As an example consider the problem of testing the hypothesis a < agfunsf, 
the alternatives <r > <ri (tro < <ri) when the X’s are identically, inflRpontlenlly 
normally distributed with unknown mean and variance. Since the prublom re¬ 
mains invariant under a common translation of the X’a we can take for F of 
the theorem V = (X 2 — Xi, X) — Xi ,■•■ ). Equivalently wo may tsiki? its our 
new sequence of variables (Fi , Ys, • • ) where 


(10.7) 


Ft 


hXjt+i ~ (Xi Xt) 

VHF+T) 


Then Fi, Fa, ■ • • are independently normally distributed ^vith w'ro immti nn(i 
the same variance as the X’e, Hence the problem reduces to a typo wliich wc have 
already considered. The optimum test is based on 


» n+l / 

Zf;= Eu.- 

^-1 <-i \ 




m + 1 /' 

It may be worth pointing out that Theorems 3.2, 8.3, 10.2 all aro special 
cases of simple results in the general theory of statistical decision functions, of 
which the following is the prototype. (For a detailed treatment of this theory 
see, for example, [30]). I..et (P«), fleO, be the family of possible distributions of 
a random variable X, and let {«) be a family of decision functions. The loss 
resulting from the use of 3(a;) when P, is the true distribution is W\0, 5(x)] and 
the risk function associated with 5 is ^8) = EeW[9, S(X)]. Let X boa probability 


measure over 12, and let h be a decision function that minimizes J Rs(,8) dK(8). 
Then if X ia such that 


S f fl»*({-) dX(f) for all 8 «12, 

S\ minimizes sup i?j(0). 

9 

Proof. Let S* be any other decision function. Then 

sup E,,(8) < I E,,(8) dX(8) g j E,.(e) dx(0) g sup E„{e). 

9 

In an analogous manner one can give an extension of Theorems 8.1, 9.1, 10.3 



THEORY OF TEBTINQ HYPOTHE8B8 


23 


11. Two sided tests considered as 3-decision problems. In a number of 
important special problems the hypothesis specifies the value of a real valued 
parameter or states that this parameter lies in a certain interval, and it is desired 
to test this hypothesis against the obvious two-sided class of alternatives. It 
seems that in nearly any problem of this kind that would arise in practice one 
would want to decide when rejecting the hypothesis, whether the true parameter 
value lies below or above the hypothetical ones. If for example one rejects the 
hypothesis that the means of two normal populations are equal, one usually 
wants to decide which of the two is larger. It would therefore seem most natural 
to formulate such problems as 3-decisioa problems. 

Problems of this kind, as all problems of hypothesis testing, naturally are 
special cases of the general decision problem formulated by Wald. We shall here 
consider the case that upper bounds are given for the probabilities of certain 
types of errors and thereby obtain a formulation, which is closely analogous to 
the classical formulation of hypothesis testing discussed in this paper, and which 
will allow immediate application of a large portion of the theory discussed here. 

Consider the case that 0 is partitioned into 3 parts, o), toi, toj where in a certain 
sense w lies between wi and wj. We wish to test the hypothesis R-. 0eu. When we 
reject the hypothesis, we shall reach either* decision Di that fie wi or decision I?* 
that fiewa. Correspondingly we prescribe two positive numbers ai, aj and impose 
the restriction that 

Ps(Di) <aiiffiew -|-&>2 

( 11 . 1 ) 

PtiDz) < aa if fi e w -f- coi. 

Subject to this condition it is desired to maximize 

Pi(Di) for fieui 

( 11 . 2 ) 

PiiDi) for fie&ii . 

A test will now consist of two non-negative functions <#>i and satisfying 
(11.3) ^i(x) + < 1, 

with the convention that when X = x the decision Z), will be taken with prob¬ 
ability (/>,(*) (i = 1, 2). 

There is no difficulty concerning the extension of the notions of invariance 
or sufficient statistic, in fact these notions obviously apply to the general deci¬ 
sion problem. The notion of unbiasedness is extended in the obvious way by the 
condition 


(11.4) 


Pi(.Pi) ^ oil for d tui 
PiiD^ '> aa for e ewj. 


One then obtains the following 

Theorem 11.1. Suppose that for testing the hypothesis Hp. fieu + wa against 
ike alternatives dtui ai level of significance m , the test ifn among ail unbiased tests 



24 


E L. LEHMANN 


ia uniformly most powerful in w + ui and uniformly least powerful in wt , and that 
<j >2 has the analogous property for testing Hi : Bew + wi against 6iWt ai mgnifimnee 
level Ui , If ^i{x) + <;> 2 (x) ^ 1 for all x, then among all procedures rniisfifiny (U .1) 
and (11.4), the procedure (<j>i , i^) uniformly maximizes the probability of a correct 
decision. (If the tests 0i, <t>i take on only the values 0 anti 1, the condition ■#ti(.r) 
fhiix) g 1 states that the rejection region of each of the two hypotheses ia con¬ 
tained in the acceptance region of the other.) 

As an example consider the case that Xj, • • • , X’b are indopendently, nor¬ 
mally distributed with common mean { and variance a*. Supple we wiah to 
test the hypothesis that m S j- g o-j where o-i may equal tra. Then it follow.^ from 
Theorem 11.1 that among all unbiased procedures of level (ai, af), there exists 
one that maximizes the probability of a correct decision uniformly in 
This is the procedure under which decision Di or Di is taken ns S(.Ti — jty f-. kt 
or g fcj and the hypothesis ia accepted otherwise. Here the fc’s arc determined by 

^ kx 1 aO == 

P(S(x, — S)* ^ fcj I cTj) = aj. 


REFERENCES 

[1] B. DB Finetti, “La provision: bos loia logiqucs, eea Bourcos aubjoctives,’* Annahi (U 

I’Instilut Henri Poincark, Vol. 7 (1037), p, 1. 

[2] P. R. Halmos and L J. Savage, “Applications of Uio Racloii-Nikodym llioorcm to i)jo 

theory of sufficient statiatios,” AnnaU of Math. Sial., Vol. 20 (i04O), p, 225. 

[ 3 ] P. L Hsu, “.^alysis of variance from tho power function atanclpoint,'' Uwinrtrika. 

Vol 32 (1941), p, 62. 

(41 P Hsu “On the power function of tho E‘-to 8 t and tlio I-LtcBt," rtnnnia of Math, 
otat,, Vol. 16 (1946), p. 278 

fa! Stein, “Most stringent testa of atatistioal hypotliBBoa," unpuldiHUed. 

^ ^ a947)”p^97’ admisBible tcBts," AnnaU of Moth. Sint., Vol. IS 

“« “-•'““■I." 

[11] E. 1“ Lehmann and C Stein, “Moat powerful teats of oompoaitc hypolhoftos I Nor- 

no, T. Annals of Math. Slat , Vol. 10 (1048), n. 495 

[12] El L Lehmann and C Stein **On fha \f * " 

Annals of Math. Sial., p 7 non-pararnetdo hypothcaoa," 

""S S;v..%“TS,T» 

^ 

Ml J. Hto„ „„ £ B, ''■ ™- » imn, P. 3S3. 



TIlKfmY OF TESTlN'r; liYFnTHKKKh 


■>n 


[17] J, Neyman and E. .S, Peaiuson, "On Ihp iPHtinR of BtsifiHlirri hyi>otliPK-i! m rfla<tnii |o 

probability a prion,” Proc. Camb. Phtl. Hoc., Vol. 29 p. -tTti!. 

[18] J. Neyman and E. S. I’EABSON, "On fho problem of the mfwl rffieient {'"nt'i r«f ptalw- 

tical hypotheses,” Phil, Trans. liny. Hoc. London, Series A, Vol. 231 dSKWf, p 

[19] J. Neyman and E. S. Pearson, "(lontributions to the theory of lestitiR stsiiptir*! 

hypotheses. 1. Unbiased crUical rpRions of typo A and type. At," Hint lUn. Mf-n , 
Vol. 1 (1930), p. I. 

[20] E. J. G. Pitman, "Tests of hypotheses roneerninR loeation and seah* pmratneters,'* 

Biomelrika, Vol. 31 (1039), p. 200. 

[21] J. B, SiMAiKA, "On an optimum properly of two important atatmtieal testa." tito 

melnka, Vol. 33 (19-11), p. 70. 

[22] C. Stein and A. Wald, ‘‘.Sequential conficlonce intervals for the mean of a twritw.! 

distribution with known variance," Annalt of .^fath. Hlal , Vol. IR (liM7), p, ATI. 

[23] A. Wald, "On the power function of tlie analysis of varinitee teat,” Annah of Mask. 

Stal., Vol.13 (1042), p. 434. 

[24] A. Wald, "On the principles of statistical inference," Ahdrc Dame .Htilh. f^ciurex, 

No. 1 (1942). 

[26] A. Wald, "Tests of statistical hypotheses concerning several parameters when the 
number of observations is large," 7’rans Am. Malh. Hoc., Vol. 64 (1913), p. 4M 

[26] A. Wald, "Statistical decision functions which minimite the maximum risk," .-inHalB 

of Math., Vol. 46 (1946), p. 266. 

[27] A Wald, "An essentially complete class of admissible derision functions," Annofa of 

Math Slat., Vol 18 (1947), p. 649. 

[28] A. Wald, Sequential analysis, John Wiley and Sons, 10-17, 

[29] A. Wald, “Foundations of s. general theory of soquentud decision functions," Peon 

ornetrica, Vol. 15 (1047), p. 279. 

[30] A. Wald, "Statistical decision functions," Annals of Math. Slat., Vol. 30 (KUO), p 165. 

[31] A. Wald and J. Wolkowitz, “Optimum character of the sequential probability ratio 

teat,” Annals of Malh. Slat , Vol. 10 (1048), p, 320. 

[32] S. S. WiLKB, “The large-sample distribution of tho likelihood ratio for testing comjws- 

ite hypotheses,” Annals of Math, Stal., Vol. 0 (10.38), p. 00. 

[33] J, WoLFOwiTz, "Additive partition functions and a class of etntislioal hyiKitheewi," 

Annah of Malh. Slat , Vol, 13 (1042), p. 247. 

[34] J, WoLFOwiTz, “Non-poramctric statistical inference," Proceedings of the Herkeky 

Symposium on mathematical statislics and probabilily (1040), p. 93. 

[36] J WoLFOwiTZ, “Tho power of the classical tests associatod with tho normal distribu¬ 

tion,” Annals of Math. Stal., Vol. 20 (1049), p. 640. 

Some related papers not referred to in the text. 

[30] T. W. Anderson, "On the theory of testing serial correlation,” Skand. Akluarielid> 
skrift, (1048), p, 88, 

[37] 11. A. PisHBR, The design of experiments, Oliver and Boyd, 1935. 

[38] M, N. Gnosii, "On tho problem of similar regions," Sankkya, Vol. 8 (1948), p. 320. 

[30] P. G. IIoEL, "Testing the homogonoity of Poisson frequencies,” Annah of Math. Slat., 

Vol. 10 (1946), p, 302, 

[40] P. G. IIoEL, "Eisoriroinating between binomial distributions," Annah of Maih, H(oi., 

Vol. 18 (1047), p. 660. 

[41] P. G. IloBL, "On the uniqueness of similar regions," Annals of Malh. Slat., Vol. 10 

(1948), p. GO 

[42] E. L. Lehmann, "Some comments on large sample teats," Proceedinp of the Berkeley 

Symposium on mathematical slalialies and prababilily (1949), p. 451. 

[43] H. B. Mann and A. Wald, "On tho choice of tho number of intervals in the application 

of the x*-test," Annals of Malh. Slat,, Vol. 13 (1942), p. 306. 

[44] J. Neyman, lectures and conferences on mathematical statislics. Graduate School of the 

U. S. Dept, of Agriculture, 1938. 



26 


B. L. LEHMANN 


[45] J. Neyman, '“Baaio ideas and some recent results of the theory of testing atatiHlical 

hypotheses,” Journal Roy. Slat. Soc., Vol. cv (1042), p. 292. 

[46] J. Nbtman, "On a statistical problem arising in routine analysis and in sampling in- 

speotion of mass production,." Annali of Math. Slat., Vol 12 (IWl), p. 46. 

[47] S. N. Ror, “Notes on testing composite hypotheses, I, II," Sankhya, VoL 8 (1947), p. 

267 and Vol. 9 (1948), p. 19. 

[48] H. ScHBPF^, "On the theory of testing composite hypotheses with am eonstraint," 

Annah of Math, Stat,, Vol. 13 (1942), p. 280. 

[49] H. ScHBFF^, "On the ratio of the variances of two normal aamplea," vtnnftts of Maik, 

Stat., Vol. 13 (1942), p. 371. 

[60.] 0. Stein, "A two sample teat for a linear hypothesis whose power is indctwndimt of 
the variance," Annals of Math. Stat., Vol. 16 (1945), p. 243. 

[SI] A. Wald, "Asymptotically moat powerful tests of statistical hypotheses," Annali of 
Math. Stat., Vol. 12 (1941), p. 1. 

[62] A Wald, "Some examples of asymptotically most powerful tests," Aijnals of Math. 

Stat., Vol. 12 (1941), p. 396. 

[63] A. Wald, “On the efficient design of statistical investigations," Annafi of Math. Stat., 

Vol. 14 (1943), p. 134. 



SAMPLE CRITERIA FOR TESTING OUTLYING OBSERVATIONS' 

By Frank E. Ghuubs 


University of Michigan and Ballistic Research Latmralorm 

1. Summary. The problem of t-cefcing outlying observations, although an old 
one, 18 of considerable importance in applied statistics. Many and various typi‘8 
of significance tests have been proposed by statisticians interested in this field 
of application. In this connection, we bring out in the Historical Cloramcnts 
notable advances toward a clear formulation of the problem and important 
points which should be considered in attempting a complete solution. In Section 
4 we state some of the situations the experimental statistician will very likely 
encounter in practice, these considerations being based on experience. For testing 
the significance of the largest observation in a sample of size n from a norrad 
population, we propose the statistic 

£ {xi ~ £„y 

0(1 i-l 
St ^ ' n 

j:(x.- *)* 

1-1 

where *i < *2 < ■ • • <*«,£, = -; 2 »,• ond S = -J2xi. 

n ~ 1 n (-1 

A similar statistic, S{/S^, can be used for testing whether the smallest observa¬ 
tion is too low. 

It turns out that 



where s'* = ^ 2(a;, — £)'*, and T„ is the studentized extreme deviation already 

suggested by E. Pearson and C. Chandra Sekar [1] for testing the significance 
of the largest obseiwation. Based on previous work by W. R. Thompson [12), 
Pearson and Chandra Sekar were able to obtain certain percentage points of 
without deriving the exact distribution of Tn. The exact distribution of S],/^ 
(or Tn) is apparently derived for the first time by the present author. 

For testing whether the two largest obsoiwations are too large we propose the 
statistic 


£ (a:. - in-i.n)’ 
<~i_ 

£ (Xi - S? 
i-1 


1 This paper haa been extracted from a thesia 
University of Michigan. 




1 

£n-X.n = 

n — 2 1-1 

approved for the Degree of PhD at the 


27 



28 


FKANK E. GKUBBS 


and a aimilHr statistic, Slt/S^, can be used to tost the signilicwicc of the two 
smallest observations. The probability distributions of the above wimpii* .‘<taf i-iica 


TABLE I 


-S'l 


Table of Percentage Points for gy 
Percentage Points 


or 


Si 


n 

1% 

2.6% 

6% 


3 

.0001 

.0007 

.0027 

i .0109 

4 

.0100 

.0248 

.0194 

' .0975 

5 

.0442 

.0808 

.1270 

.1984 

6 

.0928 

.1453 

.2032 

1 .2820 

7 

.1447 

.2066 

.2690 

i .3.503 

8 

.1948 

.2616 

.3261 

.4050 

9 

.2411 

.3101 

.3742 

i .4502 

10 

.2831 

.3526 

.4154 ; 

.4H.S1 

11 

.3211 

.3901 

.4511 

.5204 

12 

.3554 

.4232 

.4822 

,54.83 

13 

.3864 

.4528 

.5097 ! 

1 .5727 

14 

.4145 

.4792 

.5340 ! 

.5942 

15 

.4401 

.5030 

.5559 

,0134 

16 

.4634 

.5246 

.5755 

.6306 

17 

.4848 

.5442 

.6933 

.6461 

18 

.6044 

.5621 

.6095 

,6001 

19 

.5225 

.5785 

.0243 

.6730 

20 

.5393 

.5937 

.0379 

.6848 

21 

.5548 

.6076 

.6504 

.0958 

22 

.5692 

.6206 

.0621 

.7058 

23 

.5827 

.6327 

.0728 

.7151 

24 

.5953 

,6439 

.6829 

.7238 

25 

.6071 

.6544 

.6923 

.7319 


S* - 2 (a:< — iV where £ = - '^Xi 

n <-1 

S (®< - £nY where £„ =® 

n - 1 

S — *i)* where = —i V ^ 

.-2 n ~ 1 fri ' 

ri^entThiJirTw appropriate percentage points are 

tests have nl efficiencies of the above 

tests have not been completely mvestigated under various models for outlying 




TESTING OUTLYING OBSERVATIONS 


29 


observations, it is apparent that the proposed sample criteria have considerable 
intuitive appeal. In deriving the distributions of the sample statistics for testing 
the largest (or smallest) or the two largest (or two smallest) observations, it was 
first necessary to derive the distribution of the difference between the extreme 
observation and the sample mean in terms of the, population o-. This probability 


TABLE lA 


Table of PcTceniagc Points for T„ ^ ^ 

— i „ £ 

-or Ti = - 

s 

- Xx 

s 

n 

1 % 

2 6% 

6 % 

10 % 

3 

1.414 

1.414 

1.412 

1.406 

4 

1.723 

1.710 

1.689 

1.645. 

5 

1.955 

1.917 

1.869 

1.791 

6 

2,130 

2.067 

1.996 

1.894 

7 

2.265 

2.182 

2.093 

1.974 

8 

2.374 

2.273 

2 172 

2.041 

9 

2,404 

2.349 

2.237 

2.097 

10 

2.540 

2.414 

2.294 

2.146 

11 

2.606 

2.470 

2.343 

2.190 

12 

2.663 

2.519 

2.387 

2.229 

13 

2.714 

2.562 

2.426 

2.264 

14 

2,759 

2.602 

2.461 

2.297 

15 

2,800 

2.638 

2.493 

2.326 

16 

2,837 

2.670 

2.523 

2.354 

17 

2.871 

2.701 

2.551 

2.380 

18 

2.903 

2.728 

2.577 

2.404 

19 

2.932 

2.754 

2.600 

2.426 

20 

2.959 

2.778 

2.623 

2.447 

21 

2.984 

2.801 

2.644 

2.467 

22 

3.008 

2.823 

2.664 

2.486 

23 

3.030 

2.843 

2.683 

2.504 

24 

3.051 

2.862 

2.701 

2.520 

25 

3.071 

2.880 

2.717 

2.537 


xi < 

Xx < Xt • • ■ < 




n i-x n 


distribution was apparently derived first by A. T. McKay [11] who employed 
the method of characteristic functions. The author was not aware of the work of 

McKay when the simplified derivation for the distribution of - -outlined 

e 

in Section 5 below was worked out by him in the spring of 1945, McKay’s result 



30 


FRANK E. QRUBBH 


being called to his attention by C. C. Craig. It has l)iH!n n(iU«i alsi that K. K, 
Nair [20] worked out independently and published ihc rame dfrivatiitri uf the 
distribution of the extreme minus the mean arrivnl at by the prm-nt twthnr 
Biometnka, Vol. 35, May, 1948. We ncverthclm iiichule part nf Ihia tlerivafimi 
in Section 5 below as it was basic to the work in coimection with the derivations 
given in Sections 8 and 9. Our table is considerably more extensive than Xair's 
table of the probability integral of the extreme deviation from the Hiinple mean 
in normal samples, since Nair’s table runs from n 2 to n • - 9, whejtjjw mir 
Table II is forn = 2 to n = 25. The present work is roneludwl with j^ime ex¬ 
amples. 


2 . Introduction. Scientific data are collected usually for piirpo«<‘s of interpre¬ 
tation and if proper use is to be made of tlie information thus obtained then some 
decision should be reached or some action taken as a result of atialyzing the data. 
In many cases a critical examination of the data collecUHl is Jifcessnry in onler 
to insure that the results of sampling are representative of the thing nr proeess 
we are examining. Quite frequently our observations do not appear to he, eon- 
sistent with one another, i.e. the data may seem to display non-homogfneiliea 
and the group of observations as a whole may not appear to reprvfiifmt a raiuhira 
sample from, say, a single normal population or universe. In particular, tme, or 
more of the obseivations may have the appearance of being "mitlierw” and wo 
are interested here in determining once and for all whether mieh ohservaliona 
should be retained in the sample for interpreting results or whether they uluiuld 
be regarded as being inconsistent with the remaining obw‘rvation«. It is clear 
that rejection of the “outliers” in a sample will in a great number of eaw^s lead 
to a different course of action than would have been taken had such ob«‘rvati(m8 
been retained m the sample. Actually, the rejection of “outlying'* oViwrx'titions 
my be just as much a practical (or common sense) problem as a stotiatical one 
and sometimes the practical or experimental viewpoint may naturally outweigh 
My statistical contributions. In this connection, the concluding remarks of 

auitloVTr • that the 

question of the rejection or the retention of a discordant observation reduces to 

sho^d bTalwpT°'' judgment of an experienced observer 

1 oi-p 7 probability, but any test which requires an inordinate amount of ealcu- 

riewpomt. ol k , mportoce either m eupporting doubtful praolicl 

mental fcnowledee of iinUp.'lv' ^Pon in the absence of sufficient oxperi- 

In the pre^int tmatment, „e intend to thron- some lisht beyoud the work 




31 




32 


PRANK E, GHODUB 


TABLE n-Contin«<*rf 


3 4 s « r * 


VI 
• \ 

2 

2.00 

.99^2 

2,06 

.99626 

2.10 

.99702 

2,16 

.99764 

2 20 

99814 

2,26 

.99864 

2,30 

.99886 

2 35 

.99911 

2.40 

.99931 

2.46 

.99947 

2 50 

.99959 

2.66 

.99969 

2.60 

.99976 

2 65 

.99982 

2.70 

.99987 

2 76 

99990 

2 80 

.99992 

2.86 

.99994 

2.90 

.99996 

2 95 

.99997 

3 00 

.99998 

3 05 

99998 

3.10 

.99999 

3.16 

.99999 

3,20 

.99999 

3.25 

3 30 

3 36 
3,40 
3.45 

3,50 

3.65 

3,60 

3 65 
3.70 

3.75 

3 80 

3 85 
3.90 

3 95 

1.00000 


.97864 

.96818 

.98193 

.96416 

98483 

.96938 

98731 

.97392 

.98942 

.97786 

,99121 

.98125 

.99273 

.98418 

99400 

.98660 

.99507 

.98883 

99696 

.99066 

.99670 

,99222 

.99732 

.99353 

.99782 

,99464 

.99824 

.99657 

.99868 

,99635 

. 99886 

.99701 

.99909 

.99756 

,99928 

.99800 

.99943 

.99838 

.99966 

.99868 

.99964 

,99894 

.99972 

.99914 

.99978 

.99931 

.99983 

.99946 

.99987 

.99966 

99990 

,99966 

99992 

.99972 

99994 

.99978 

.99996 

99983 

.99996 

.99986 

.99997 

.90989 

.99998 

.99992 

.99998 

.99994 

.99999 

.99996 

99999 

.99996 

.99999 

.99997 

1.00000 

99998 


.99998 


.99999 


.99999 


.93582 

.015216 

.946,36 

.92627 

,06289 

.93605 

.05949 

.04468 

.96527 

.osast 

.97032 

.05837 

.97470 

.96482 

.97850 

,06992 

.98178 

.0743,5 

.08461 

07813 

98703 

.08161 

.98011 

.08436 

99088 

.08681 

,99238 

-08891 

.99365 

.99070 

.09473 

.09223 

.99664 

.99352 

,09640 

.90461 

,99704 

.9955.3 

,99767 

,99631 

.99801 

.99096 

.99838 

.99760 

.99868 

.09795 

.09893 

.09832 

.99913 

.99863 

.99930 

.99889 

.90944 

.99910 

.99966 

-99927 

.99964 

.99941 

.99971 

.99063 

.99977 

.09062 

.99982 

.99070 

.90986 

.99978 

.99989 

.99081 

.99091 

.90986 

.09993 

.09988 

.99996 

.09991 

99996 

.99903 

.99997 

.09994 

99997 

.99996 


,83381 

.S7%i 

90721 

.88832 

.91916 

iwaw 


!U4(iO 

'<3917 

OWH 

94746 

jxmi 

95176 

64462 

.96114 

'1.5239 

.145672 

.mm 

.07158 

. 96487 

.97,W) 

.96999 

97914 

.97443 

.982.69 

.97827 

.9K529 

.981.58 

.98761 

.£1811.3 

,989,69 

98688 

.99128 

.98897 

. mw'a 

WHTtfi 

.imia 

,9£«2T 

.99496 

.9935,5 

,99.W2 

.W164 

.99665 

• ■>9.566 

,99716 

.996,32 

.99766 

.99697 

.90808 

.99750 

.99843 

.99796 

.99872 

.99832 

.99896 

.99863 

.09916 

.99889 

.99932 

.99910 

.09946 

.99927 

.99966 

.09941 

.09066 

.W62 

.99972 

.90902 

.99977 

.99909 

.99982 

.99976 

.90986 

.99981 

.99989 

.99986 

.99991 

.99988 

.99993 

.09990 


.WIW ^2 

.wnm a 

«*U'A I'i 

*»!'«■. ia 

■ Mian L 

MliH *2 

.tuaw i‘2 
.yrfiu.'j a 

■urwa |a 

■ iWiia S3 

M'ai 2 
.ur.m h 

.!t77»l {a 

»Hi:H (2 

SlKtll \i 

. j2 

tJKHT'J l2 

.mw k 

.iifan a 

.tHa« 3 
.t»L13 3 

.JHIMO 3 
.!W35 3 

.OTJO 3 

.mi745 3 

.99701 3, 

.B9H2tl 3. 
.g^WiO 3. 
.t>9h«(5 3. 

-Wt^itW 3, 

.mK> 2 s a, 
.«KM0 3, 
.SKKlai 3, 

Mmi 3. 

.wao 3, 
.m>975 3. 

.»!K)H0 3. 
.999M 3. 
.99087 3. 


^SSSS oSSSS 8SSSS SSkSsJ SjksSS ftkSSS SS5SS 



TESTING OUTLYING OBSERVATIONS 


33 


TABLE 11—Continued 


\ IP 

■\ 

2 

3 

4 

s 

6 

7 

8 

9 

U 

4.00 



.99999 

.99998 

.99996 

.99996 

.90992 

.99990 

4.00 

4.05 



.99999 

.99999 

.99097 

.99996 

.99094 

99992 

4.05 

4,10 



1 00000 

.99999 

.90998 

.09997 

.99095 

.99904 

4.10 

4.16 




96999 

.99998 

99997 

.99990 

.99995 

4.15 

4.20 




.99999 

.99999 

.99998 

.99997 

.99996 

4.20 

4.26 




.99999 

.99909 

.99998 

99998 

.99997 

4.25 

4.30 




1.00000 

.99999 

.99999 

.99998 

.99998 

4.30 

4.35 





.99099 

.99999 

.90999 

,99998 

4.35 

4.40 





1.00000 

.99999 

.99999 

.99999 

4.40 

4.46 






99999 

.99999 

,99999 

4.46 

4.60 






1.00000 

.09999 

.99990 

4.50 

4.66 







1 00000 

.99909 

4.65 

4,60 








1.00000 

4.60 

n 

u 

10 

11 

12 

13 

14 

15 

16 

17 

U 

.26 

,00001 

.00000 

.00000 

00000 

00000 

.00000 

.00000 

.00000 

25 

.30 

.00003 

00001 

,00000 

.00000 

.00000 

.00000 

.00000 

,00000 

.30 

.35 

.00011 

.00004 

.00001 

00001 

.00000 

.00000 

.00000 

.00000 

.35 

.40 

.00032 

.00013 

.00005 

.00002 

.00001 

.00000 

,00000 

.00000 

.40 

.45 

.00080 

.00036 

.00016 

.00007 

.00003 

.00001 

.00001 

.00000 

.45 

.60 

.00178 

.00086 

.00042 

.00021 

.00010 

.00005 

.00002 

.00001 

,50 

.66 

.00351 

.00185 

.00098 

.00051 

.00027 

.00014 

.00008 

.00001 


.60 

.006*13 

.00363 

.00204 

.00116 

.00065 

.00037 

.00021 

00012 

.60 

.65 

01098 

.006.57 

00393 

.002,35 

00141 

.00084 

.00050 

.00030 

.65 

.70 

.01760 

01113 

00702 

.00443 

.00279 

.00176 

.00111 

.00070 

.70 

.76 

02694 

.01780 

01177 

.00777 

.00614 

.00330 

.00224 

.00148 

75 

.SO 

.03928 

.02707 

.01865 

.01285 

.00880 

.00610 

00420 

00289 

.80 

.85 

.05503 

.03938 

02818 

.02016 

.01442 

.01031 

.00738 

.00527 

.85 

.90 

.07444 

.05610 

04077 

.03017 

.02232 

.01662 

,01222 

.00901 

.90 

.95 

.09761 

.07448 

.06682 

.04334 

.03305 

.02521 

01922 

.01460 

.95 

1.00 

.12462 

.09763 

.07666 

06000 

.04703 

.03687 

.02889 

.02205 

1.00 

1.05 

15497 

,12464 

.10008 

.08041 

.06460 

.05190 

.04160 

.03348 

1.05 

1.10 

.18807 

.16503 

.12737 

.10464 

.08696 

.07060 

,05799 

.04702 

1.10 

1.16 

.22520 

,18879 

. 16825 

.13203 

.11110 

,09315 

.07800 

.00541 

1.16 

1.20 

.26407 

.22542 

.19240 

.16420 

.14013 

11957 

.10203 

.08700 

1.20 

1 25 

.30476 

.20442 

.22941 

.19901 

.17263 

. 14973 

.12987 

. 11264 

1.26 

1.30 

34666 

.30525 

26876 

.23662 

.20830 

.18336 

.10140 

.14207 

1.30 

1.36 

,38924 

.34734 

30992 

.27660 

.24607 

22005 

,19629 

.17509 

1.36 

1.40 

.43196 

.39011 

36229 

31810 

.28721 

.26931 

.23411 

21135 

1.40 

1.46 

.47430 

.43302 

39529 

.36082 

.32934 

.30068 

.27433 

.26036 

1.45 





34 


BBANK E. GRTJBBS 


TABLE II —Continued 


\'l 

10 

11 

12 

13 

• '\ 





1 60 

,61683 

47666 

.43838 

40408 

1.65 

.65616 

.61720 

.48104 

.44733 

1.60 

.59495 

.65774 

52282 

.49004 

1.66 

63196 

.69668 

.56332 

,63178 

1.70 

.66099 

.63380 

.60221 

.67216 

1.76 

69991 

,66892 

.63926 

.61086 

1.80 

.73063 

,70189 

.67424 

.04763 

1,86 

.76912 

.73264 

.70704 

.68229 

1.90 

.78538 

.76113 

.73768 

.71472 

1.95 

.80946 

.78737 

.76684 

.74486 

2.00 

.83141 

.81140 

.79183 

77269 

2.06 

.86133 

83330 

.81680 

.79824 

2.10 

86932 

.85314 

.83721 

,82166 

2.16 

,88650 

87105 

.86678 

84271 

2 20 

89998 

.88713 

.87440 

.86183 

2.26 

,91290 

.90161 

89021 

.87902 

2,30 

.92437 

.91431 

.90432 

.89441 

2.35 

.93463 

.92568 

.91688 

.90812 

2.40 

.94348 

.93672 

.02799 

.02030 

2.45 

95134 

.94467 

.93781 

.93106 

2.60 

95823 

,96233 

94644 

.94066 

2.65 

96424 

.95912 

.96400 

94887 

2.60 

.96948 

.96604 

.96060 

96616 

2.65 

97401 

.97019 

96635 

.96261 

2.70 

.97793 

.97454 

.07134 

.96802 

2.76 

98131 

.97849 

97665 

.97280 

2 80 

.98422 

.98180 

.97937 

.97693 

2.85 

08671 

.98464 

.98267 

.98048 

2.00 

.98883 

.98708 

.98631 

.98363 

2.95 

.99064 

.98916 

.98766 

.98614 


14 

15 

16 

IT 


.37244 

.34327 

.31630 

.29166 

1.50 

.41696 

.38676 

.35960 

.33434 

1.66 

.48930 

.43040 

.4a342 

.37807 


.60199 

.47384 

.44726 

,42216 

1.66 

.64368 

.61641 

.4905H 

.46602 

1.70 

.58370 

.66773 

.53289 

.S091S 

1.75 

.62204 

.69744 

.67380 

.65108 

l.«) 

.65838 

.63628 

.61297 

.69144 

1.85 

.69264 

.67102 

.66018 

.62992 


.72443 

.70463 

.88616 

.66630 


.76399 

73671 

.71786 

.700-12 

2.00 

78121 

.76463 

.74819 

.73218 

2.06 

.80614 

.70101 

.77614 

70163 

2.10 

.82886 

.81619 

.80174 

.78849 

2.16 

.84941 

.83715 

82605 

.81311 

2.20 

.86796 

.86690 

.84616 

,83545 

2,26 

.88468 

.87484 

.86618 

.86663 

2.30 

.89943 

.89081 

88224 

.87376 

2.36 

.91264 

.90604 

.89748 

,88997 

2.40 

.02436 

.91766 

.91101 

.90440 

2,45 

.93468 

.92883 

,92300 

.01720 

2.60 

.94876 

.93866 

.93367 

.92850 

2.65 

.96172 

,94728 

.04286 

.93844 


.96806 

.96482 

'05098 

.04716 

2.66 

,96471 

,96139 

.06807 

.06475 

2.70 

.96996 

.96709 

.96423 

.96137 

2,76 

.97448 

,97203 

,96067 

.06712 


97839 

.97629 

.97418 

.07208 

2.86 

.98174 

.97996 

.97816 

.07636 


.98462 

.98300 

.98166 

.08003 

2.96 


3.00 

3.05 

3.10 

3,16 

3.20 


.99218 .99092 
.99348 .99242 
.99468 .99369 
.99561 .99476 
.99628 .99566 


3.25 
3 30 
3.35 
3.40 
3.45 


.99694 .99641 
,99748 ,99704 
.99793 ,99757 
.99831 .99801 
.99862 99837 


.98966 

.99134 

.99278 

99400 

.99502 


.98837 

.99026 

.99187 

99323 

.99437 


98708 

.98917 

.90096 

.99246 

.90372 


98678 

.98807 

.99002 

.99167 

.99307 


.98448 

.98697 

.08909 

.09080 

.90241 


.98318 

.08687 

.93810 

.09010 

.09175 


3.00 

3.06 

3.10 

3.16 

3.30 


99588 

99660 

-99720 

,99770 

.99812 


.99634 .99479 90424 
99616 .99669 .99623 
.99682 .99644 .99606 
.99739 .99707 .99676 
.99786 .99760 ,99733 


.99369 

09477 

.99563 

,99644 

.99707 


.99314 

.99431 

.99629 

.99611 

.99680 


3.26 

3.30 

3.36 

3.40 

3.46 













TEaTlNG OUTLYING 0B8EKVATI0N8 35 


TABLE II —Continued 


\- 

10 

11 

12 

U 

U 

15 

1« 

U 

■ 

“ \ 










3.60 

.99888 

.99867 

.99846 

.99825 

.99Ha3 

.99781 

.09759 

.99737 

3.50 

3.65 

.99909 

.99892 

.99876 

.99867 

.99839 

.09821 

.99803 

.99785 

3.66 

3.60 

.99926 

.99912 

.09898 

.99884 

.99869 

99864 

.098.39 

.99824 

3.60 

3.66 

.99940 

.99929 

.99917 

.99906 

.99894 

.99881 

.09869 

.99867 

3.66 

3.70 

.99962 

.99943 

.99933 

.99924 

.99914 

.99904 

.09894 

.99883 

3.70 

3.76 

.99961 

. 99961 

,99946 

.99938 

.90030 

.09922 

.99914 

.09906 

3.75 

3.80 

.99969 

.99963 

,99967 

.99960 

.99944 

.99937 

.99930 

,99923 

3,ao 

3.86 

.99976 

.99970 

99965 

.99960 

.99966 

.99949 

.99944 

.09938 

3.86 

3.90 

.99980 

.99976 

.90972 

.99968 

.99964 

.99969 

.99965 

.99960 

3.90 

3.95 

.99984 

.99981 

.99978 

.99974 

.99971 

.99967 

.99964 

.99960 

3.96 

4.00 

.99988 

.99985 

.99982 

99980 

.99077 

.09974 

.99971 

.99968 

4.00 

4.06 

.99990 

.99988 

.99988 

99984 

.99982 

.90979 

.90077 

,09974 

4.06 

4 10 

.99992 

.99991 

.99989 

.99987 

.90986 

.99083 

.99981 

.99970 

4.10 

4.16 

.99994 

.99993 

.99991 

.99990 

.99988 

.99987 

.99985 

.99984 

4.16 

4.20 

.99996 

99994 

.99993 

.99992 

.99991 

.99990 

.99988 

.99987 

4.20 

4.26 

99996 

.99995 

.99995 

.99994 

.99993 

.90092 

.90991 

.09990 

4.26 

4.30 

.99997 

.99996 

.99996 

.99996 

.99994 

.09003 

.90993 

.09992 

4.30 

4.35 

.99998 

.99097 

99997 

09906 

.99996 

.99906 

.09904 

.99903 

4.35 

4.40 

.99998 

.99998 

.99997 

.99997 

.99906 

.09096 

.99996 

,99905 

4.40 

4.46 

.99999 

.99998 

.99998 

.99998 

.99997 

.00997 

.99996 

.99996 

4.46 

4 60 

.99999 

.99999 

99998 

09998 

90998 

.99998 

.99907 

,90997 

4.60 

4.65 

99999 

.99999 

99999 

.00999 

.90998 

90008 

.90098 

.90997 

4.66 

4.60 

.99999 

99999 

.99999 

.99999 

09000 

.90998 

.90008 

.99998 

4.60 

4.66 

1.00000 

.99999 

99999 

.99999 

.99909 

.90999 

.99909 

.99998 

4.66 

4.70 


1.00000 

.99999 

99999 

.99999 

.99909 

.99999 

,99099 

4.70 

4 76 



1.00000 

1.00000 

99999 

.99009 

.99009 

.99999 

4.76 

4.80 





1.00000 

.90099 

.09909 

.99999 

4.80 

4.86 






1 00000 

1.00000 

1.00000 

4,85 

\ » 










«\ 

IS 

19 

20 

21 

22 



25 

H 

.60 


.00000 

,0000 

.0000 

0000 

.0000 

.0000 

.0000 

.60 

66 

lExgiM 

.00001 

,0000 

.0000 

.0000 

.0000 

.0000 

.0000 

.66 

.60 

.00007 

.00004 

.0000 

.0000 

.0000 

.0000 

.0000 

.0000 

ieil 

.66 

.00018 

.00011 

.0001 

.0000 

0000 

.0000 

.0000 

.0000 

.65 

.70 

.00044 

.00028 

.0002 

.0001 

.0001 

.0000 

.0000 

.0000 

.70 

.76 

00098 

.00065 

,0004 

.0003 

0002 

.0001 

.0001 

.0001 

.76 

80 

.00199 

.00137 

.0009 

.0007 

0004 

.0003 

.0002 

.0001 

.80 

.86 

.00377 

00270 

.0019 

,0014 

0010 

.0007 

.0006 

.0004 

.86 

.90 

.00669 

00494 

0037 

.0027 

.0020 

.0016 

,0011 

.0008 

.90 

.96 

.01118 

.00863 

0066 

.0049 

.0038 

.0029 

.0022 

.0017 

.96 









36 


frank E. GRUBBS 


TABLE II—Continued 


VI 

■ \ 

IS 

19 

20 

21 

22 


34 

zs 1 


1 00 

.01775 

.01391 

.0109 

.0085 

.0067 

.0062 

.0041 

.00.32 

1.00 

1.05 

.02690 

.02161 

.0174 

.0139 

.0112 

.0090 

.0072 

.0058 

1.0.5 

1.10 

03911 

.03212 

.0204 

.0217 

.0178 

.0146 

.01^1 

.ram 

1.10 

1.16 

.05481 

.04592 

.0386 

.0322 

.0270 

.0226 

.0190 

,0169 

1,15 

1.20 

.07428 

.06338 

.0641 

.0461 

.0394 

.0336 

.0287 

.0244 I 

I 

1.20 

1.26 

.09769 

.08472 

.0735 

.0637 

.0553 

.0479 

.0410 

,0300 1 

1.25 

1.30 

.12604 

.11005 

.0969 

.0853 

.0760 

.0600 

.0581 

.0612 ' 

1..30 

1.36 

.15618 

.13930 

.1242 

.1108 

.0988 

.0882 

.0786 

.0701 

1.35 

1.40 

.19080 

.17225 

.1555 

.1404 

.1267 

.1144 

.1033 

,0932 

1.40 

1.46 

.22848 

.20851 

.1903 

.1736 

.1585 

.1446 

.1320 

.1204 1.46 

1.50 

.26869 

.24761 

.2282 

,2103 

.1938 

.1780 

.16-16 

.1.616 

1.60 

1.56 

.31084 

28899 

.2687 

.2498 

.2322 

.2169 

.2007 

.1866 

1.65 

1.60 

.35430 

33202 

.3111 

.2916 

.2732 

.2560 

.2399 

.2248 

1.60 

1.65 

.39846 

37607 

,3549 

.3349 

.3162 

.2984 

.2810 

.2658 

1,65 

1.70 

.44269 

.42052 

.3094 

.3794 

,3604 

.3424 

,3252 

,3080 

1.70 

1.75 

.48646 

46476 

.4440 

.4242 

.4053 

.3872 

.3699 

,3634 

1.76 

1.80 

.52924 

.60827 

4881 

.4687 

.4602 

.4323 

.4162 

.3087 

i.ao 

1.85 

.67066 

.55058 

.6312 

.6126 

.4945 

.4771 

.4003 

.4441 

1.85 

1.90 

.61031 

69130 

.6729 

.6549 

.6377 

.6209 

.60-17 

.4890 

1.00 

1.95 

64796 

.63011 

.6127 

.6968 

.6794 

.6634 

.6479 

.6328 

1.95 

2 00 

.68340 

.66678 

.6506 

.6348 

.6193 

.6042 

.6896 

.6762 

2.00 

2.06 

71650 

.70114 

.6861 

,6714 

.6670 

.6429 

.6291 

.6156 

2,06 

2.10 

.74719 

.73311 

.7193 

.7068 

.6924 

.6703 

.6666 

.0640 

2.10 

2.15 

.77545 

.76262 

7600 

.7376 

.7264 

,7133 

.7016 

.6899 

2.16 

2.20 

.80132 

.78971 

.7782 

7870 

.7668 

.7448 

.7340 

.7234 

2.20 

2.25 

.82486 

.81440 

8041 

7938 

.7838 

.7738 

.7640 

.7643 

2.26 

2 30 

84616 

.83679 

.8275 

.8184 

.8093 

.8003 

.7914 

.7827 

2.30 

2.35 

.86533 

.86699 

8487 

.8405 

.8324 

.8244 

.8164 

.8085 

2.35 

2.40 

.88251 

.87511 

.8678 

.8605 

.8533 

.8461 

.8390 

.8319 

2.40 

2.45 

.89783 

89129 

8848 

.8784 

.8720 

.8656 

.8693 

.8630 

3.46 

2.60 

.91142 

.90668 

.9000 

,8943 

.8887 

.8831 

.8776 

.8719 

2.60 

2.55 

92345 

.91842 

.9134 

.9084 

.9036 

.8986 

,8036 

,8888 

2.66 

2 60 

.93404 

,92965 

.9263 

.9209 

.9166 

.9123 

.9080 

.9037 

2.60 

2.65 

.94332 

.93961 

.9357 

.9319 

.9282 

.9244 

.9207 

.9169 

3.85 

2.70 

.95144 

.94814 

.9448 

.9416 

.9382 

.9361 

,9318 

,9286 

2.70 

2.76 

.95862 

.96567 

.9528 

9500 

.9472 

.9144 

.9416 

,9387 

2.76 

2.80 

.96466 

.96220 

.9598 

.9573 

.9649 

.9524 

.9800 

.9470 

2.80 

2 85 

96997 

.96787 

9658 

.9637 

.9616 

9596 

.9674 

.9563 

2,85 

2.90 

.97466 

.97275 

9710 

.9692 

.9674 

.0656 

.9638 

.9620 

2,90 

2.05 

.97850 

.97696 

.9754 

,9739 

,9724 

.9709 

.9693 

.9678 

2,96 





TESTIN'G OUTLYING OBSERVATIONS 


37 


TABLE II —Continued 


\ « 

X 

IS 


30 

21 

22 

21 

n 

25 


3.00 

98187 

.98057 

.9793 

.9780 

.9767 

.9763 

.9741 

.9728 

3.00 

3.06 

.98476 

98366 

.9825 

.9814 

.9803 

,9793 

.9781 

.0771 

3.05 

3.10 

.98722 

.98629 

.9853 

.9844 

9836 

.9826 

0810 

,9807 

3.10 

3.16 

,98931 

.98352 

.9877 

.9869 

9862 

.9853 

.9840 

.9a3R 

3.16 

3.20 

,99108 

.99042 

.9898 

.9891 

9884 

.9878 

.9871 

986.6 

3.20 

3.26 

.99268 

.99202 

.9915 

.9909 

.9904 

.0898 

.0803 

.9887 

3.26 

3.30 

.99384 

.99337 

.9929 

.9924 

.9920 

.9915 

.0011 

.9906 

3.30 

3.36 

.99490 

.99451 

.9941 

.9937 

.9033 

.9930 

0920 

.9922 

3.36 

3.40 

.99579 

.99646 

.9951 

.9948 

.9945 

.0942 

9939 

.9030 

3.40 

3.46 

99663 

.99626 

.9960 

9957 

.9955 

.9952 

9949 

.9947 

3.46 

3.60 

.99716 

.99093 

.9967 

.9906 

.9903 

.0961 

.0958 

.9950 

3.60 

3.66 

.99766 

.99748 

.9973 

.9971 

.9969 

.9908 

.0900 

.9904 

3.65 

3.60 

.99809 

.99794 

.9978 

.9976 

.9976 

9973 

.9972 

.0971 

3.60 

3.66 

.99844 

99832 

.9982 

.9981 

.9979 

.9978 

9977 

.9970 

3.05 

3.70 

.99873 

.99863 

.9985 

.9984 

.9983 

.9982 

.9982 

.9981 

3.70 

3.76 

.99897 

.99889 

.9988 

.9987 

.9986 

.9980 

.9985 

9984 

3.76 

3.80 

.99917 

.99910 

.9990 

.9990 

.9989 

.9988 

.9988 

.9988 

3.80 

3.85 

.99933 

.99927 

.9992 

.9992 

.9991 

.9991 

.9990 

.9990 

3.85 

3.90 

.99946 

.99941 

.9994 

.9093 

.9993 

.9993 

.0992 

.9992 

3.00 

3.96 

.99966 

.99963 

.9996 

.9996 

.9994 

.9994 

.9904 

.9994 

3.06 

4.00 

.99965 

.99962 

.9996 

.9990 

.9995 

.9995 

.9995 

.9996 

4.00 

4.06 

.99972 

.99009 

9997 

.9996 

.9996 

.9990 

.9996 

.9996 

4.06 

4.10 

.99977 

.99975 

.9997 

9997 

.9997 

.9997 

.9997 

.9997 

4.10 

4 16 

.99982 

99980 

.9998 

.9998 

.9998 

.9998 

.9908 

.9998 

4.16 

4.20 

,99986 

.99984 

.9998 

.9998 

.9998 

.9998 

.9998 

.9998 

4.20 

4.26 

.99989 

99987 

.9999 

,9999 

.9999 

.9099 

.9099 

.9999 

4.26 

4.30 

.99991 

,99990 

.9999 

.9999 

.9999 

.9999 

.9909 

.9999 

4.30 

4.35 

.99993 

.99992 

.9990 

.9999 

.9999 

.9090 

.9909 

.9999 

4.36 

4.40 

.99994 

.99994 

.9999 

.9999 

.9999 

.9099 

.0999 

.9999 

4.40 

4.46 

.99996 

99995 

1.0000 

.9999 

.9999 

.9999 

.0909 

.9999 

4.45 

4.60 

.99996 

.99990 


1.0000 

1.0000 

1.0000 

.9009 

.0999 

4.60 

4.66 

.99997 

.99997 





1.0000 

1 0000 

4.66 

4.60 

.99998 

.99997 







4.00 

4.66 

.99998 

,99998 







4.06 

4.70 

.99998 

.99998 







4,70 

4.76 

.99999 

.99998 







4.76 

4,80 

.99999 

.99999 







4.80 

4 86 

.99999 

.99999 







4.86 

4.90 

1 00000 

1.00000 







4.90 


S Qoap-si ®S290Q5 t:? 

^OOt O0iOC7»O taOcnOS OCaOdO dOCaOOi O&iO&O CttO^O^ 0Cn055o 



38 


FRANK B. QBUHUS 


that has already been done [1], [2], [3), [4), [11], [12], [20] on thr pniblem of t^st- 
ing outlying observations statistically and to see just wheif tnir eontributinna 
fit into this comer of mathematical statisticH, First, however, we givn a very 
brief history of the problem. 


3. Historical comments. A survey of statistical literature indicaUm that the 
problem of testing the significance of outlying observationa retrived crmsitlrniVile 
attention prior to 1937. Since this date, however, publiBhwl lilerature on tlif 
subject seems to have been unusually scant-~perl]ftps Ijocau.-^* of inherent diffi¬ 
culties in the problem as pointed out by E. S. Peanam and C. C'haudra Sokar [1]. 
These authors made some important contributions to the problem of enitlying 
observations by bringing clearly into the foreground the eoneejrt nf efRrietiey of 
testa which may be used in view of admissible alternative, hypothews. 

In 1933, P. R. Rider [2] published a rather corapixihcnKive survey of work on 
the problem of testing the significance of outlying observations up to that date. 
The test criteria surveyed by Rider appear to impose as an initial condition that 
the standard deviation, tr, of the population from which the items wep* drawn 
should be known accurately. In connection with such tests retjuiring aemiraU': 
knowledge of «r, we mention (1) Irwin's criteria [3] which utiti»e, the diffemnee 
between the first two individuals or the dilTerenco between the second and thinl 
individuals in random samples from a normal population and (2) the ninge’ nr 
maximum dispersion [4], [6], [0], [7], [8], [9], [10], [18] of a eamide which lias hr^en 
advocated by “Student” [4] and others for testing the 8ignificani*(* of outlying 
observations. We remark further that a natural Btaliatic to use for ttssting an 
“outlier” is the difference between such an extreme observation anti tlm «implc 
mean. In 1935, McKay [11] published a note on the distribution of the laal,- 
mentioned statistic and by means of a rather elaborate prcieedurc obtained a 
recurrence relation between the distribution of the extreme miium the mean in 
samples of n from a normal universe and the distribution of this statistic in 
samples of n - 1 from the same parent. McKay gave also an approximate expres¬ 
sion for the upper percentage points of the distribution but did not tabulate the 

S Of the multiple integrals involved. 

M Kay pointed out that ff K, denotes the p-th semirinvariant of the distribution 
of x. X (where x„ is the largest observation) and K', refers similarly to the 
distribution of x„, then K, = K] ^ JC, JCj _ ^ 3 

rireen threSme^^T tabulated the distribution of the tliff(«rence be- 

tween the extreme and sample mean forn « 2 to n » 9 

circumstances, accurate knowledge concerning ^ may be avail- 
ab^ as, for example, in using “daily control” tests [4], [18] the popXrion stod- 
arddevia^^ estimated in some cases with si’fficiLt pSlinTom^ 

howeverDedtricrof’thfBaU^reference [9], 1942; 

distribution of the raiige in an unpubhsWd Ah also derived the exact 

E n an unpublished Aberdeen Proving Ground Report (1926). 



TESTING OGTLYINQ OBSERVATIONS 


30 


data. In general, however, an accurate estimate of a- may not be availaltle and 
it becomes necessary to estimate the populaticm standard deviation from this 
single sample involved or “Studentizc” [18], [20] the statistic to he. used, thus 
providing a true measure of the risks involved in the significance teat ailvocated 
for testing outlying observations. W. R. Thompson [12] apparently had this veiy 
point in mind when he devised an exact test in his paper, “On a Criterion for 
the Rejection of Observations and the Distribution of the Ratio of the Deviation 
to the Sample Standard Deviation,” whicli appeared in 1935. Thompson showed 
that if 



where S = - 2 a:,, ~ and a;, is an observation selected arbi- 

n <-i n ,-1 

trarily from a random sample of n items drawn from a normal parent, then the 
probability density function of 

Vn - 1 - ’r* 

is given by "Student’s” <-distribution with / = n - 2 degrees of freedom. 

Pearson and Chandra Sekar have given a rather comprehensive study of 
Thompson’s criterion in an interesting and important paper [1] which appeared 
in 1036. They discussed also some very important viewpoints which should ho 
taken into consideration when dealing with the problem of testing outlying 
observations. By setting up alternatives to the null-hypothesis Ho that all items 
in the sample come from the same population, Pearson and Chandra Sekar point 
out that if only one of the observations actually came from a population with 
divergent mean, then Thompson’s criterion would be very useful, whereas if 
two or more of the observations are truly outlying then the criterion j a:, — ^ | > 
To® may he quite ineffective, particularly if the sample contains less than about 
30 or 40 obsen^ations 

A point of major interest concerning Thompson’s work nevertheless is that he 
proposed an exact test for the hypothesis that all of the observations came from 
the same normal population. With regard to the use of an arbitrary observation 
in Thompson’s test, however, it should be borne in mind that the problem of 
finding the probability that an arbitrary observation will bo outlying is dilTerent 
from that of finding the probability that a particular obsenmtian (the largest, 
for example) will be outlying with respect to the other n - 1 observations of 
the sample. 

As a final point concerning the paper of Pearson and Chandra Sekar [1], we 
see that for the n values of T, arranged in order of magnitude taking account of 
sign, say 

yd) yd) ^ _ yCn) 



40 


PH„\NK K. (ilU'IillK 


then 




The above authors show that the fonn of the tot^il flbtrihutinn nf all tho T^ 
at its extremes depend only on T**’ and 7"^"*. This is t»fH\nif.e fur Hunf romfijua' 
tions of sample size and percentage points the algehraie npjwr limif htir 7' * and 
algebraic lower limit for do not extend into the '‘tnifs"' nf thft diKtri' 
bution, Hence, the followmg probability law holds for T'” when 7'**' the 
algebraic maximum of IT'®: 


Likewise, 


p{T‘”j = NpiT). 
plT*"’} = NpiT) 


for r'"' < algebraic minimum of T‘" Therefore, Pearwm and Cllumdra fik'kar 
were able to use Thompson’s table [12] and give (for sotne Ramide upper 

probability limits for for the highest obsc-rvntiim lunl biwer prtiba- 

biliLy limits for T " = for the lowest obserN-ation without m-tiudly rt| uain- 

ing the exact probability distribution of 7’'" and T'"’. neiiee, th(' appm,ranee of 
the table of percentage points on page 318 of tlieir papaw [1] was a hubsiantial 
contribution to the problem of testing outlying obfierviilirmH sinee nn evnef tr^st 
for the significance of a single outlying observation w'os providorl fnr the eiwe 
where an accurate estimate of <r is not available, (The exiief di.-.tnbutiori of 7“'*’ 
or T'”’ is derived later in this work.) 

With the above highlights of historical background in mind, we turn now to a 
consideration of the types of problems the experimentiT may be ftmtal with in 
testing “outlying” observations. 


4. Statement of hypotheses in tests of outliers. Once tin* sample reanlta of 
an experiment are available, the practicing statistician may confronted witli 
one or more of the following distinct situations as reganls di.Hconlant olwrva- 
^ frequent or perhaps prevalent tiitmanm i.s that 

tie ™ ol’^'rvation in a «miple may havo 

the appearance of belonging to a different population than the one from whieli 
the remaining observations were drawn. Hero wc arc. confrontini ivith tests for 

“"I i'" •" »ii™ 

‘I 1° “"""i-K il*-'"" i" 

and the smallest ohsera ^ toting the IiypeUiesis thpt heth the lurgeat 

“thrteTrtLe <0 «ll«aUon 

Tto 2The. sJeTe T “■ «“Ong a deeWon 

a. not being reptesentaWe'S the'thil^ ™ "simper 



TESTING OUTLYING 0B3EKVATI0NS 


41 


As to why the discordant observations in a sample may be outliers, this may 
be due to errors of measurement in which case we would naturally want to reject 
Or at least “correct” such observations. On the other hand, it may be that the 
population we are sampling is not homogeneous in the uni-modal sense and it 
will consequently be desirable to know this so that we may carry out further 
development work on our product if possible or desirable. 

Although there may be many models for outliers, we believe that an important 
practical case involves the situation where all the observations in the sample 
may be subject to the same standard error, whereas it may happen that the 
largest or smallest observations result from shifts in level. For example, if one 
observation appears unusually high compared to the others in the sample wo 
may want to consider the hypothesis that all the observations come from a 
normal parent with mean and standard deviation a as against the alternative 
hypothesis that the largest observation comes from a normal population with 
mean (n + X<r (?i > 0) and standard deviation o', whereas the remaining observa¬ 
tions are from Niy., <r). 

Another case involves the situation where the largest and/or smallest obser¬ 
vations may be from N{y, \a),\ > 1, whereas the remaining observations of 
the sample are from the normal parent Niy, o) 

Although we have not investigated the power of the tests proposed liorein for 
various models, it is believed that the exact test of Section 8 for the ]arge.st (or 
smallest) observation and the test of Section 9 for the two largest (or two small¬ 
est) observations possess considerable intuitive appeal for the practical situations 
described above.® 


6 . Distribution of the difference between the extreme and mean in samples 
of n from a normal population. The simultaneous density function of n inde¬ 
pendent observations from a normal parent with zero mean and variance 
which are arranged in order of magnitude is given by 

(1) dF{xi , ■ ■ ,x„) = 27® ^ J dxi dxt ■ ■ ■ dXn 

subject to Si < % < • • < Xn 

Since 

S (a;. - :£)“ = - , (a:.. “ + S (». ' *n)® 

1 71 1 

where 





Sai,, 

1-1 


* The author is indebted to J. W. Tukey and S. S, Wilks for calling attention to an in¬ 
correct distribution function in the originally submitted manuscript on which several 
yet-to-be proved or disproved statements concerning optimum properties of stiithstics in 
this paper were based. 



42 


FRANK E. GRUBBS 


then 


w n — I n ^ 

n — 2 , \i • I 3 / *1 + *j + ^iV 

(2) + rrzi + • • • + 2 r* “ ~ 3 / 


3 

2 / xt + XiV 

+ ..^x, ^ 2-; 


where 

1 

£„,*_! =-s 2^ *•. etc. 

?l — iS l-l 

and consequently we find that we are particularly interested in the following 
Helmert orthogonal transformation: 

V2'l<ri?2 = —Xi + Xj, 

■\/3'2(frii — ■~*i ~ xa + 2xa, 


( 3 ) . 

Vn(n — l)tn)n = —»i — Xj — la — Xi — ■ • ■ — X, 

--x„-i + (n - I)x^, 

Vnfftjn+I - ®1 + Xj + Xa + *4 + • • • + Xr + • * • + Xn-I H" X« . 


The above transformation will lead to the distribution of the difTcreneo be* 
tween the extreme and sample mean in terms of the unknown population tr for 
samples of n from a normal parent. Since, however, K. R. Nair (Biomirika, 
May, 1948) has already published the details independently, we will only re¬ 
cord here for later reference that the density function of tji , tta, • • > , ti« (after 
integrating 7;„+i over — oo < < -|- oo) h 

(4) dFim, i?3, • • ■ exp I - ~ g dm dm ■ ■ ■ 

where the m are restricted by the relations 


( 5 ) * > »;2 > 0 , \/ f ^2 

Upon making the transformations 


( 6 ) 

defining 

( 7 ) 


Vr(r — 1) X, — 2 

——t,,. ~.n, (, 


Fn(.u) = f dF(u„) 

V Jo 


= probability u„ ^ u, 


2,3, 


,«)» 



TESTING OUTLYING OBSERVATIONS 


43 


and integrating the Un over their appropriate ranges we find the cumulative 
probability integrals of the extreme deviation from the sample mean (in terms 
of the population v) for n. = 2, 3, • ■ • to be 

F^{u) = 2 \/2 dx = ^ e~*’ dx, 

a well-knovm result, where for n = 2, * is cither the sample standard deviation, 
the difference between the extreme and sample mean, the mean deviation or the 
semi-range. 


( 8 ) 


F„{u) = 


n -y/n /"'_i_ g-l((n)/(n-i)»u 
s/n — 1 “<■ ■\^2Tr 


F 


n-l 



dx. 


This 18 equivalent to the result of McKay (11), although the derivation in¬ 
dicated is a considerably simpler one. 

Now F„_i('u) increases from 0 to 1 as « incroasoa from 0 to ». Hence,' if 

( “W- 1 71 

-- u 1 is practically unity, i.e. for -, u numerically large, the 

n “ 1 / n — 1 

upper percentage points of may be approximated by the normal integral 


( 9 ) 


n 

r 1 ^ a 

1 V 


/./^PL“2n-l^" 

J ■s/n 

n 

r r i*' 


“ V^. 

/ exD — - 

f ■\/n/(n—l)Mn L 

j eft 


■ du„ 


Formula (9) was found to be particularly useful in checking the higher prob¬ 
abilities in Table II. 

The cumulative distribution functions (8) may be put into another form by 
setting 



r »= 2, 3, 


n . 


Then F„(u) becomes 




(10) 


.. f r 

Jq Jq 



44 


PRANK E. OlfORMS 


Define the following functions; 


H^ix) = 1, 

1 
2 




JL~ 

2-1 


HiH) d£, 


irn(x) 




r ^^expT 

1 £’ 


2 n(a - i)J 




Hence, the probability that the difference between the extreme and the mean 
in samples of n from a normal population is less than iw is given by the tdtenia- 
tive forma 


PjWn < wcr) = Fn{v) = 

Of course, Hn{nu) —> 1 as « for any given n. 

In the November 1945 issue of Biomclrika, Godwin (13) arrivtKl at a wrifta 
of functions closely related to the Hr{x) in connection with the dihlrihulitin of 
the mean deviation in samples of n from a normal parent. In GtKiwin’s work, 
he defines functions Grix) which are related to the //,(») by the oqimtion 

{2^7'^ HrM = Grix), 

The Grix) functions were computed by H. 0. Hartley [15J for r 2, 3, ■ ■ • 9 
only. Computations on the functions F„iu), i,o. (8), were well under way by 
the author before Godwin’s article on the mean deviation appearetl, 'The. !/,(*) 
or Grix) can be used to obtain both the distribution of the difference betwt^en 
the extreme and mean and also the probability integral of the mean deviation, 
Indeed, it is believed that these functions may have a useful place in tabulating 
distributions of order statistics. 


6 . Tabulation of the distribution function, Fniu). 

The tabifiation of the F,(u) with ordinaiy computing equipment is quifco 
laborious. However, a table model computing machine was used initially to 

obtain the F„(u) for n = 2 to n = 15 using formulae (8) and a numerical quad¬ 
rature process, * 

usefulness of the Hrix), these functions were 

ENIAC ^ computing device, the 

N AC (Electronic Numencal Integrator and Computor) of the Ballistic Re- 

due to probleme of Kr pno y 

until March, 1948. ^ fc^iGtionB were not computed on the ENIAC 



TKHTINtf OlTIiYINQ OIISKRVATKINS 


45 


search LiihuraforieH «f the Ordnance Dt'partment.* In this connciction, tlie//r(u) 
have been ctitnputed for r 2 to r « 25 at the Rallistic. Itesearch Laboratories, 
For ft - 2, tlic functions IIr{x) were computed to nine decimal places of ac¬ 
curacy on the ENIAf anrl at 7i •- 25 aliont five decimal place.s of accuracy 
were obbiined. In 'Fable If we have tabulated or II„(nu), i.e. the prob- 


'I'AHLK III 

PfrmiUxge Poiritti for Extreme Minus Mean 


.i 

tXKl, 

‘.l55'o 

90% 

99.5% 

2 1 

1.103 

1.386 

1.821 

1.985 

3 ; 

1.497 

1.738 

2.215 

2.396 

4 1 

1.090 

1.941 

2.431 

2,618 

^ 1 

1.835 

2.080 

2.574 

2.704 

i 

1.939 

2.184 

2.679 

2.870 

7 1 

2.022 

2.267 

2.701 

2.952 

8 1 

2.091 

2,334 

2.828 

3.019 

« i 

2.1.50 

2.392 

2.884 

3.074 

U) ! 

2.2tK) 

2.441 

2.931 

3.122 

H 1 

2.'245 

2,184 

2.973 

3.163 

12 

2.284 

2.523 

3,010 

3.199 

13 

2.320 

2.557 

3,043 

3.232 

14 

2.352 

2.589 

3.072 

3.201 

15 

2.382 

2.017 

3.099 

3.287 

10 

2.409 

2.044 

3.124 

3.312 

17 

2.434 

2.608 

3.147 

3.334 

18 

2.458 

2.691 

3.168 

3.355 

19 

2.480 

2.712 

3.188 

3.375 

20 

2.500 

'2.732 

3.207 

3.393 

21 

2.519 

2.750 

3.224 

3,409 

22 

2.538 

2.768 

3.240 

3.425 

23 

2.555 

2.784 

3.255 

3.439 

24 

2.671 

2.800 

3.269 

3.453 

25 

2.587 

2.815 

3.282 

3.465 

ability integral of the extreme minus tlie mean, 

at intervals of u « 

.05<r, Values 


computed on the table model computing machine agreed to five decimal places 
at n = 16 with values from the ENIAC. Percentage Points of the distribution 
are given in Table III and the moment constants may be found in Table IV. 
Moment constants for ft « 60, 100, 200, 500 and 1000 were obtained by use 
of McKay’s formulae [11] (which relate the semi-invariants of a:„ — ^ with 
those of Xn) and Tippetts momenta [5] for the largest observation Xn . 



46 


frank E. GRUBBS 


TABLE IV 


Moment Constants for Extreme Minus Mean 


n 

Mean 

Std. 

Dev, 

“I 


2 

.5642 

.4263 

.9953 

3,H(i92 

3 

.8463 

.4755 

.8296 

3.7135 

4 

1.0294 

.4916 

.7675 

3.G717 

5 

1.1630 

.4974 

.7372 

3.6.500 

6 

1.2672 

.4993 

.7165 

3.0511 

7 

1.3522 

.4991 

.7042 

3.6503 

8 

1.4236 

.4979 

.6959 

3.0.518 

9 

1.4850 

.4962 

.6900 

3.0546 

10 

1.5388 

.4943 

.6857 

3.(3582 

11 

1.5864 

.4923 

.6827 

3.0622 

12 

1.6292 

.4902 

.0804 

3.06(33 

13 

1.6680 

.4881 

.6788 

3.(3705 

14 

1.7034 

.4861 

.6777 

3.(3746 

15 

1,7359 

.4841 

.6770 

3.(3787 

20 

1.867 

.475 

.677 

3.700 

60 

2.319 

,436 

,699 

3.R01 

100 

2.508 

.418 

.712 

3.855 

200 

2.746 

.395 

.737 

3.932 

500 

3.037 

.368 

.771 

4.033 

1000 

3.241 

.350 

.794 

4,105 


7. Relation between the distribution of the largest minus the mean of all 
n observations and the largest minus the mean of the remaining n>l items. 
The following relation is of interest concerning these two statistics: 

Let 


tin — 


*1 + *1 + • • • + 


n 


Let 


= - {(n - 1) !r„ - *1 - - • •. - . 


- H, - ■■■+«.-■ 


n — I 


= {(n - 1) - a:j - rj - ■ • ■ - 


Hence, 


«■ = 


n - 1 





TMTINC} OUTLYIKO OBBEBVATIOKS 


47 


Pif!» 4) *=■= P 2 j ^ ” ^■— ^|i 

i.e. the pmlmbilily mt<»gnU of tf»p lat^t minus the mean of the other observa- 
ti&ns nmy l>» obtainwi hy inPfrpolalion on the distribution of the largest minus 
the mean nf all n ifms in the sample. 

8. The distribute of B’/S‘ and S*/a*. As mdicat{Kl in the Summary, we pro¬ 
pose the sample critt'rion 

(a—I 

^ 2-r ^ ^ n—I 

^ m *rl -- < ^ 3 ,^ ^ 

i: (r, -£? 

for teHting the rngiiUiraniH^ of the largmjt observation and the criterion 

cjt i (x. - fiT . , 

iJ »- !"* ---£ fc 2, ;■ - ]C®(| 

for testing whether the ai^ialleat olwervslion is outlying. Wc now find the prob* 
ability distribution of »'»*''A'*; hence, also that of jS*/iS’.* 

Returning to the denttity function 


dP(nt , >j> 1 • • • »e») « L 

of Section 5, we make the polar transformaUon 


'-it/ 


djjt dqi dijn 


tfj « r sin flw sin B^x • • • an <?< sin , 

»jj ” r sin sin d,_i ■ • • sin i ?4 cos (?j, 

tj* a* r mn sin Sn~x • • ■ cos , 


n„^x « r Bin ff, cos 0*-.j, 
n« “ f co« 0*. 


£ )}* » ^ (X{ — i)* »»= r* 

<-j («» 


W—1 «—t 

23 ir* >* X) (»< “ ^)* * 



48 


FRANK E. GRUBBS 


Hence, 


sin’ ^ 


2 (*. - 


Z (X. - £? 

l-l 

The Jacobian of the above transformation is 


r”" Bin"~ d„ am • • ■ sin Bt sin Bi sin , 
and since 0 < r < oo 

dFiBn , 5„_i, • • ■ , 06, , 08) 

= ^ 2 j,.)tn-i)/s 2*" r ( ^ - 2 sin"“* 0* • • - sin’ 0| sin 6i d 
Since the restrictions on the ij,- are 


(7.-3)/2 p ^ sin"-* , sin’ 0, sin 04 <i0,i • > • ci0* dSt dBt. 




we have 


tan 0„ cos 0„-,i == 


n >: 4, 


tan 0„ < 


See 0n—1 , 


«• 4, 


0 < 0, < I. 

_ •> - 3 • 


Thus, letting K. = V , we see that 

ftO\ /*** ^*n-i 

^ i, io " ■ io I ^4 d0„ ... de, dOt - 1, 


I we see that 


where I, = tan ^ ^ sec 0,. 

Upon reversing the order of integration (the variable limits 
get for w = 3 


are monotonic) we 


Bo that 


pr/3 

^3 j( de, = 1 , 

P (63 <e) =K,j^de, 0 < 0 < Ma = tan“’ x/Tl. 



tmtixo curri/TiNG oimEiivATioNs 


49 


Whm n - 4, wt< (iltUin 

#-«4 r*/t 


whero 
m, « tan" 
so that 


i .( f / dBidOt ~ 1 

^f ™ “ 2) and h, ~ »cc~' 


■ tan 9r, 


(I5a) P{Bt < $) - ain $4 d$i when 0 < 0 < mi = tan“‘ 


and 


(Itib) PiOi < 0 ) - 1 j sin Oi dOi Kt f f sin 64 dOi ^4 


when VI 4 = tan' 


* l>rli 

mi J Li 

V5- 


d <M 4 ~ tan ' '\/4-2. 


When n - 5, we get, 

»m| ,m( f,T/i 


p p /■'/* . ,m| /.Ml -WJ 

^‘io Jo Jo + 8in*0,Bin(?idO,cWi dfl( 

/.M( -Ml -r/j 

+ / / / Bin* 04 Bin 04 cWj ct 04 d0j « 1 

•/wi 

^wherc Li = »e,c ‘ ton 04 is to bo taken as 0 whenever O4 < thi * 


80 that 


(10a) P(06 < 0) 5= ^ ^ Bin’ 0( dOi when 0 < 0 ^ ms == tan“^ 


and 


^ r*"P Z*^ rA<4 rtir/d 

(16b) P(0i < 0 ) « jf^ Bin’ OtdSi + Ktj sin* 0s sin 04 dOt ddi dOi 

where ms = tan‘"‘ ,|/| < B < Mt <= tan"*’ , 

' tan 04 = 0 whenever O 4 < rru ^ tan"* A /? 

y 2 ' 


and wo put Li = sec 


.-1 . / 2 



60 


FBAJIK E. QB0BB8 


For a sample of w items 

/ n - l \ 

;i7a) ^ 


-57.,.,(!^,i) wta. 0 S « £ taa- 

n T /n — 2' l\ 

P(0n < fl) = ^ hlWn-D) ^-2~>2/ 


.1 fU.^l fTIt 

+ Kn I I I • ■ • / Bin"~ • • • am fit dfij dfit • ■ • dOn 

Jmit Jt* 


rrin = tan“‘ y ^ fi < M, =• tan~'vn(n ” ^ 

where /,(p, g) is K. Pearson’s Incomplete Beta Function Ratio [19]. It is to be 
understood in (17) that 

Lt.=> 8eo~* tan fi< for t « 4, 6, • ■ • , n — 1 


is to be taken as zero when fi< < tan"* 
Percentage points for the sample statistic 


i - 2’ 


Bin e, = ^ = 


oi Z) (a:i ~ *»)* 

Oi« <-i 


^ (jtfi “ £) 


or the statistic Sl/Si are given in Table I and were obtained by inverse inter¬ 
polation on the tabulation of the probability integral (17) above. Percentage 

points for the Pearson and Chandra Sekar statistics, ^or 


X — Xt 


(where s® == 4 X) (»i - *)’)> are given in Table lA. Tiro etatietiMi 


S’/ s’ and T» are related by the formula 


S\ _ . _ T\ 

S* n — 1* 


‘ It has been noted that (17a) gives a good approximation to (17b) when 6 ^ tan 
provided we are interested in the important practical region P < ,10, at least for n :< 116. 



TERTLVa Otn-mNO OBSERVATIONS 


61 


Tlie hfatihtif* T,. >nr 3',) if* wif«r to compute tlian Sl/^f (or Sl/S^). The tabula¬ 
tion r.f the innltiplo int^'gral ri7) was carried out on tlie Bell Relay Computora 
at the Ballif.tic Ri'f-<*fsrch LalHjratories. 

S. The distribution of and As indicated in the Summary, 

the eriU-rion for jiulRinR the Rignificance of the two largest observa¬ 

tions i.H 


23 


- --< * where= -i-- 

Eix.-ir 




- “* - ■ ^ k where 2i,j = —23 Tf. 

’ M__ fl ““ * 


and that fur testing the two mnallest observations is 

•l 

£ (x. - ii.s)’ 

£ (x. - z) 

i-1 

From tlu' prectniing wetinn, we note that 

« n*-‘i 

£.? 

7^ 


n - 2 or, 


T'' 1 2 T*' 9 8 * a A < a « 

* 7 , « r , £ >?< “ r am sin e^-i . 


(»!! 


Hence, 

(18) 


£ (t, - 

ain“5« sin^fln-n = , 

£ (x, - 


so that if we find the distribution of 

sin* On mn® On-i “ sin* A„ , say, 

then we have the distribution of Sn-i.n/^ and hence also that of Sl,t/S^, i.e, 
(19) Pfsin* A„ < A} = P{A„ < sin’* Vk\ . 

Returning to the multiple integral (13), let 


sin Art « sin fi„ sin On-i, 


A<>«0v, — 1, 

The Jacobian of this transformation is given by 

iM. 




3(A«, " •, A|) Vsin’ A„_i — sin* An 

The limits of integration for An are given by 

Vn sin Art-i 


0 < An sin 


-I 





52 


ITRANK E. 0 RUUH 8 


and, of course, those for An_i, ■ • , Aa are the same as the limite for , 
respectively. Hence, substituting in (13), we obtain 


»t/ 3 y.Un"‘ VlTiloijA, 

Jo Jo 


1 


w—‘II 


( 20 ) 


pRln 

Jo 




H—IJ —( R—3)9 


sin”"* A„ sin""'* A«_i 


sin’ A{ sin Aj cos A*, rfA, 


(iAj 


sm"'* Ar_i \/sin' A„_i -- .-in- A, 


*= 1 , 


Reversing the order of integration, we have 


( 21 ) 


, _,a/ "(n-s) . .- 

^ f‘‘" ^ b-l)(n-2) r‘«“’V(n-l)(«-J) j 


j-rl3 

J««'V!ll< ‘»nA< 


8m"~’AnSm."~*An-i • • • sinA-i cos Ar dA, • • ■ dA* 
sin" * A„_i -x/sin* Ar_i — sin’ A, 


^for A, < tan ‘ tuiu.al to 


zero where t 


( 22 ) 


n *fi 


so that for n = 4, 

lA ftl) 


sin A< cos AI dAi dAi 


=== sin Aj Vsin* Aj — sin* At 


where 0 < A < sin"* 
and for n = 6, 

r(A. < A) = if. f‘ T""'® 

(23) 


**/5+»ltt5A| 






•N/i+Siln^Aj 
»/® 


where 0 < A < sin"’ , etc. 


■/ 


sin’ At C09 At dAj dAt clA^ 
sin Ai -v/sin’ At — sin* A* 


We remark that an obvious extension of the above principles eliould lead to 
the distributions of 

•Sn-SiTi-i.n/S’ and S’, 4 , 5 /S*, 

'Sn- 3 ,n- 4 ,n-l,n/S’ and Si, 4 ,s,t/S’, 

et^ although the tabulation of such probability integrals may be exceedingly 
dimcult. 

The problem of tabulating the probability integral (21) involves a double 
quatoture process and has been carried out on the Bell Relay Computors at 
the BaUistio Research Laboratories for n = 4 to n = 20, inclusive. Table Y 
gives some useful percentage points for these sample sizes. 



TESTING OtTTnYING OBSERVATIONS 


5S 


TABLE V 


Table oj Percentage Paints for 

o* 


n ^ 

_ 1 

! 1 % 

2£% 

6% 

10 % 

1 

4 

.(KKX) 

.0002 

.0008 

.0031 

5 

' .oais 

.0090 

.0183 

.0376 

6 

.0186 

.0349 

.0565 

.0921 

7 

.M40 

.0708 

.1020 

.1479 

8 

.0750 

.1101 

.1478 

.1994 

9 

[ .1082 

.1492 

.1909 

.2454 

10 

.1415 

.1865 

.2305 

.2863 

11 

. 1730 

.2212 

.2666 

.3228 

12 

.2044 

.2536 

.2996 

.3552 

13 

.2333 

.2836 

.3295 

.3843 

14 

.2605 

.3112 

.3568 

.4106 

16 

.2850 

.3367 

.3818 

.4345 

16 

.3008 

.3603 

.4048 

.4562 

17 

.3321 

.3822 

.4259 

.4761 

18 

.3530 

.4025 

.4455 

.4944 

19 

.3725 

.4214 

4630 

.5113 

20 

.3000 

.4391 

.4804 

.5269 


(x< - 

2 )* where £ — 

1 ^ 





n <-i 







r.» 

“l.n ^ S (^i “ 
<-1 

■ where 

1 

n — 2 t-i 



n 1 ” 

5!,* = Z) (*< — where £i,i = -2 2 :. 

(-% n — z <-j 

10. Commeat on the distribution of Si,n/S^. In connection with the distribu¬ 
tion of the statistic 


*9" 


W —I. 


®x,a) j H—1 

irl---, where £j,„ >= --^ x/, 


for testing Bimultaneously whether the smallest and largest observations are 
outlying, an investigation indicates that since 


X? = ni-* -f 


n 


1 

+ 


(I, - sy + 


1 I ^2 


(®1 - inf ri¬ 


ft — 2 

+1(« - - '^ ■ 3 +2 («. - 


71 — 3 


(Xft—1 ■"* 




FHANK E. GRUBBS 


64 


then the transformation 

■\/ 2 -lVi = — % + *J) 
y/Z-2fi3 = — *2 — xs + , 

'V^4'3U4 ^ Xa Xa ^ Xi "i“ 3x#) 


(24) 


V{n - 2)(n - 3)v„-i = - x, - x» - • • • - + (n ~~ 3)Xn i, 

V(n - 1 ) in - 2)un-i = - (n - 2)xi + xj + Xj +•■■*+ x ,-1 , 

'\/n{n — l)rn = — Xi — Xa — Xj • ■ • Xn-i -h (n- ■“ t)xB > 

VwW„+i = Xi + Xa + • • • + Xn, 

followed by transformations of the type (11) and that of Hw'tiou 1) ituiy h'tixl 
to the distribution of /S^n//S^ However, the limits of inteKnvtiou tUi not turn 
out to be functions of single variables and the tusk of ('omputing tlur msulting 
multiple integral may be rather difficult. 


11. Examples on testing outlying observations for rejection. Wo now turn 
to the problem of applying our theory to particular practical examples of data 
which appear to have outlying observations. Apparently, in the following ex¬ 
amples there were not sufficient practical or experimental grounds to ti'ject 
the suspected outliers and hence some statistical judgement became necessary 
either to support retaining the "outliers” in the sample or leave little doubt 
that certain of the observations should be questioned. 

Example 1. Our first example has almost become a classical one as Irwin 
[3], Rider [2], and other writers on the subject including Chauvenet, Peirce, 
Gould, etc. (see Rider’s survey [2]) all refer to it, applying their various tots. 
The example consists of a sample of 16 observations of the vertical semi-di¬ 
ameters of Venus made by Lieut. Herndon in 1846 and is given in William 
Chauvenet’s, A Manual of Sjiherical arid Praclical Aslronomy, II (fith ed., 
1876), p 562. The individual residuals or deviations from the mean are; 


-0.30" 

0.48 

0.63 

-0.22 

0,18 

-0.44 

-0.24 

-0.13 

-0.05 

0.39 

1.01 

0.06 

-1.40 

0.20 

0.10 

i observations 

in increasing order 

of magnitude, we 

-1.40" 

-0.24 

-0.05 

0.18 

0.48 

-0.44 

-0.22 

0.06 

0.20 

0.63 

-0.30 

-0.13 

0.10 

0.39 

1.01 



TBBTIKG Om*l.n.VO OBSERVATIONS 


55 


and it in wm that two of thn n'Kiduals, —1.40 and 1.01, appear to be outliers. 
Rider [2] indieatfw that the above obwrvatioiia have lieen referred to by previ¬ 
ous writers sw ‘‘n'.HitlualH"; iieverth(‘lc.sa their sum is 0.27, so that the sample 
mean, i Ijf*t us apply the exact test, i.e. Ti of Pearson and Chandra 

Rekar or sih di'velqped in Section 8 for a single ovitlier to the least obser¬ 
vation, — 1.40. We find Xi *-■ —1.40, jc *= .018 and « .532 (alternatively, wo 

find •->" 4.2400 using all 15 obsen'atioiiH and »S'i = 2.0953 which is hosed on 
14 observations, (he mwiiectel outlier —1.40 not lH?ing included). Further, 

rtv — “ ‘-idles (or Sf/S* = 0.4931) and from Table lA 
fl ..mZ 

(or Table I) we «ee that 0.01 ^ P < 0.025 so that we would reject the observa¬ 
tion — 1.40 when using the. 5% level of significance. Having rejected —1.40, 
we now have left a sample of 14 observations and test the greatest one, i.e. 1.01. 
For baaed on the remaining 14 observations, we have n = 14, Xn = 1.01, 
£ = .119 and « « .387 (alternatively, for the new sums of squares, we find 
si «« 1.2400 leaving out 1.01 and iS* =» 2.0953 including the observation 1.01). 

Hence, « 2.302 (or <Si/S* « 0.5922) and from 

9 .col 

Table lA (or I), we find P slightly less than .10, so that we decide to retain the 
observation 1.01. 

It would have been interesting nevertheless to see whether or not the test 
would have rejected simultaneously the observations —1.40 and 1.01 
if percentage points for the distribution of this statistic were available. 

It is of interest to remark that for this particular example Irwin [3, page 
245], using the difference between the first two individuals divided by an esti¬ 


mate of <T, i.e, ^ - - , concluded also that —1.40 but not 1.01 should be re- 

<r 

jected. In testing bo(h of these observations, Irwin used the single biased esti¬ 
mate for V, 



(assuming * = 0), 


based on all 15 observations. It is a mere coincidence, of course, that for this 
example Irwin’s test gives the same result as the exact test Ti or the test based 
on the ratio Sl/^. In this connection, Irwin rightly calls attention to the fact 
that in dealing with a sample of only 16 observations the standard deviation of 
the sample is a very unreliable estimate of the population standard deviation. 

It is remarked that here we would, of course, hesitate to apply the test -- 

(T 

to the observation —1.40 as we do not have available and accurate estimate of 
<r from past data. 

Example 2. The following ranges (horizontal distances from gun muzzle to 
point of impact) were obtained in firing projectiles from a weapon at a constant 
angle of elevation and at the same weight of charge of propellant powder: 



56 


FRA-NK E. GRUBBS 


Distances in yards 


4782 

4420 

4838 

4803 

4765 

4730 

4549 

4833 


It is desired to Icnow whether the projectiles exhibit uniformity in ijallislic 
behavior or if some of the ranges, such as 4549 and 4420, are not couHisbrnt 
with the others. 

Arranging the distances or ranges in increasing order of magnitude, 


4420 

4782 

4549 

4803 

4730 

4833 

4765 

4838 


we suspect the presence of two outliers, i.e. 4420 and 4549. Having no available 
knowledge of a from past data for this example, an intuitively efficient test to 
apply would be that of Section 9, i.e S},i/ S*. 

We find 


os 2 {xi — 

01,S <-3 

c-2 — a 

® E (*. - *)’ 

-1 


.054 


which 13 significant at the .0,1 level (Table V) and consequently we would judge 
the distances 4420 and 4549 yds. as being unusually low. 

As a matter of interest and as a recommended temporary practical expedient 
for testing several “outliers”, consider for example the last seven of the above 
ordered observations, 


4549 

4803 

4730 

4833 

4765 

4838 

4782 



and apply the exact test, Sl/^, to the smallest observation, 4549. Wn find 
= .145 so that ,01 < P < .025 from Table I and we sliuuld thus reject 
4549 from the sample of seven. Moreover, we should now surely rejcict 4420 
as being outlying, arrivmg at the same result we had for the test Thus, 

as a general temporary expedient in testing for “outliers” one couid rank the 
observations, and apply the tests Sl/S^ (or Sl/S'^) and SJ.s/aS’'* (or SlJif), 
thus working from the “inside” observations of the ranked sample in order to 
establish consistency of the observations. 



TESTING OUTLYING OBSERVATIONS 


57 


12. Additional comments, Although we have used a signifi(‘.ancc level of .05 
in the examples, it may be preferable from a practical viewpoint to reject outly¬ 
ing observations only at a lower level, such as .01 or .005. 

Extensions of the ideas for testing outlying observations presented in this 
paper may lead to efficient sample criteria for testing the significance of various 
numbers of high, low, or simultaneously high and low sample values. However, 
the mathematical details would probably be complicated. In this connection, 
it is remarked nevertheless that the advent of high-speed computing devices 
may have considerable hearing on establishing experimentally any probability 
distribution. That is to say high-speed electronic computing devices could prob¬ 
ably be programmed to generate random numbers with frequencies equal to those 
of the normal (or any other) distribution, to compute various functions (such as 
ratios in this paper) of sample values, etc., and establish frequency distributions 
to a desired order of accuracy. 

13. Acknowledgement. The author is greatly indebted to Prof. C. C. Craig 

under whose most competent guidance this work was carried out. Indeed, the 
stimulus and encouragement received by the author as a result of Prof. Craig’s 
interest in the problem were paramount in orienting various phases of the sub¬ 
ject matter and in accomplishing the results given herein. In connection with 
the computing, debts of gratitude arc owed to several members of the staff 
of the Ballistic Research Laboratories. It is desired to express appreciation to 
Col. Leslie E. Simon, Director of the Ballistic Research Laboratories, for recog¬ 
nizing the importance and dosii-abihty of carrying out the computations on 
high-speed computing devices. In this connection, appreciation is also expressed 
to Dr. L. S. Dcderick, Chief, Computing Laboratory, Ballistic Research Lab¬ 
oratories. The programming of the Hn{u) functions on the Electronic Numerical 
Integrator and Computor was done by Dr. Derrick Lehmer of the University 
of California, who was with the Computing Laboratory, BRL, during the latter 
part of World War II and Miss Ruth Lichterman. The author is particularly 
indebted to Dr. Franz Alt, Dr. Bernard Dimsdale, Miss R. Lichterman, Mr. 
John Holbcrton, Miss H. Marks, Mr. F. Spence, and others of the Computing 
Laboratory during the period the functions Unix) were computed on 
the ENIAC. The computing of the distiibution of the statistics S\/S^ 
and was done on the Bell Relay Computers under the direction of 

Mr. J. 0. Harrison, Mrs. M. Musincup and Mr. E. Cushen. The author also 
desires to express appreciation, to Miss Helen J. Coon for considerable compu¬ 
tation and checking carried out on a table-model computing machine. 

REFERENCES 

[1] E. S. Pearson and C. Ouandra Sekar, “The efficiency of Btatistic&l tools and a 

criterion for the rejection of outlying observations", Biomeinka, Vol. 28 (1936), 
pp 308-320, 

[2] P R. Rides, “Criteria for rejection of observations”, Washinglon University Stud- 

tea —New Senes, Science and Technology—^No. 8, St. Louis (1933). 



58 


FRANK E. QRUBE8 


[3] J. 0. Irwin, “On a criterion for the rejection of outlying observationa", Bmmelrika, 

Vol, 17 (1926), pp, 238-250. 

[4] “STUDi!;NT'',fiiomeln4a, Vol. 19 (1927), pp 151-104. 

[5] L, H, C. Tippett, "The extreme individuals and the range of samples taken from a 

normal population", Biomelrika, Vol. 17 (1925), pp. 151-1&4. 

[6] E S. Pearson, "A further note on the distribution of range in samples taken from a 

normal population”, Bionidrika, Vol. 18 (1920), pp. 173-194, 

[7] Tables for StaMiaans and Biomelrimns, Pari II, edited by Karl Pearson, pp, CX- 

CXIX, 

[8] E. S. Pearson, “The percentage limits for the distribution of range in eamplea from 

a normal population", Biomelrika, Vol. 24 (1932), pp. 404-417. 

[9] H. 0. Hartley, "The range in random samples", Biomelrika, Vol. 32 (1942), pp 334- 

348. 

[10] E. S Pearson and H. 0, Hartley, "The probability integral of the range in samples 

of n observations from a normal population", Biomelrika, Vol, 32 (1912), pp. 301- 
310 

[11] A, T, McKay, "The distribution of the dilleronoe between the extreme observation 

and the sample mean in samples of n from a normal universe", J7ionwlnika, Vol 
27 (1935), pp. 466-471, 

[12] W R. Thompson, “On a criterion for the rejection of observations and the dislribu* 

tion of the ratio of the deviation to the sample standard deviation", /inna/a of 
Mali Slat, Vol. 6 (1936), pp. 214-219 

[13] H. J Godwin, “On the distribution of the estimate of moan deviation obtained from 

samples from a normal population", Biomelrika, Vol. 33 (1945), pp, 254*251], 

[141 W.P A, Tables of Probabilily Fmclions, Vols. I and II, New York, N.Y (1912) 

[16] H. 0. Hartley, Note on the calculation of the distribution of the estimate of mean 

deviation m normal samples", Biomelrika, Vol. 33 (1945), pp, 257-258. 

[161 "Tables of the probability integral of the mean deviation in normal BampIcB", Bio. 

' meirika, Vol, 33 (1946), pp. 259-265. 

[17] J, Nbyman and E 8. Pearson, “On the problem of the most efficient tesU of stalls. 

hoi ^oy. Soc (London), Vol, 231 (1933)* pp. 289-337. 

118] E S Pearson and H, 0. Hartley, “Tables of the probability integral of the atu- 
dentized range,", Biomelrika, Vol, 33 (1943), pp. 89-99. 

(191 K Pearson, Tables of the hcomvlele Bela-Funetm, published by the Bioraetrika 
Office, University College, London, 

(201 K. R Nair, “The distribution of the extreme deviate from the sample mean and its 
studentized form," Biomelrika, Vol. 36 (1948), pp, 11H44. 



Page 
69 
60 
62 
63 


DISTRIBUTION OP THE CIRCULAR SERIAL CORRELATION 
COEFFICIENT FOR RESIDUALS FROM A FITTED FOURIER 

SERIES*'" 

By R. L. Andkiison and T. W. Anderson' 

North Carolina State College and Columbia University 
CONTENTS 

Summary. ... ,. , . .... 

1. Introduction. . . 

2. The use of fitted Fourier series . ... . 

3. Tables of significance points of fE.. . 

3.1 Significance points of R using a seasonal trend for annual, semi-annual, bimonthly, 

and monthly data......... .. .63 

3.2 Significance points of JJ for other single-period trends. 66 

3.3 Example of use of significance points ., 67 

4. Testing the hypothesis of lack of serial correlation . 69 

41 Statement of the problem. 09 

4 2 Preliminary transformations. . , .70 

4.3 The likelihood ratio criterion. .7l 

6. The exact distribution of nE....78 

6.1 Introduction. 73 

6.2 Some special distributions of lE •= E. 74 

5.3 Some special distributions of lE for L > 1.76 

6.4 The exact distribution of uE when p 0... 76 

6. Moments , , .... . ... 77 

6.1 The exact moments of E. . ..77 

6.2 Approximate moments of E when p =• 0. .77 

6.3 Approximate moment generating function of C and Y when p 0. 78 

7. Approximate distributions of E. 78 

7.1 The Pearson Typo I (Incomplete Beta) distribution. 78 

7.2 The normal approximation. 80 

References . 80 


Summary. In this paper the observations are considered to be normally dis¬ 
tributed with constant variance and means consisting of linear combinations 
of certain trigonometric functions. The likelihood ratio criterion for testing the 
independence of the observations against the alternatives of circular serial cor¬ 
relation of a given lag is found to be a function of the circular serial correlation 
coefficient for residuals from the fitted Fourier series (Section 4), The exact dis¬ 
tribution (Section 5), the moments (Section 6), and approximate distributions 

* Included in Cowloa Commisaion Papers, Now Series, No. 42. 

• Presented to the meeting of the Institute of Mathematloal StalisticB at New York, 
December 30,1947. 

‘Fallow of the John. Simon Guggenheim Memorial Foundation; Researoh Consultant 
of the Cowles Commission for Research in Economics. 

59 
































60 


K. L. ANDEHSON AND T. W, ANDEH60N 


(Section 7) are given for the cases of greatest interest. From thrisc results sig¬ 
nificance levels have been found (Section 3). The use of these levels is indioated 
(Section 2), and an example of their use is given (Section 3). 


1. Introduction. Two mathematical models have been used (‘xU'UKivoly ui 
time-series analysis. In one model the observation is the sum of a “systeinatic 
part” and a random error. The cyclical properties of this mmh'l n*sijlt from the 
cyclical properties of the systematic part, which is usually taken to 1 h‘ a short 
Fourier series. The stochastic element is superimposed on the non-atordiastie 
part, and the error at one time point does not affect a later observation. 'I'ho other 
model is the stochastic difference equation or “autoregreasive nirKlel,” An ob¬ 
servation is the sum of a linear function of previous observations and a random 
element. The cyclical properties foUow from the properties of the difference 
equation (i e., the linear combination of observations), but are disturbed by llu* 
random disturbance that is integrated into the system. A more gtuieral nuKlel 
can be constructed that includes both of the two mentioned. The oliservation 
can be taken as a linear combmation of past observations and Fourier terms plus 
a random element. 

In this paper, the linear combination will be only a multiple, of sonu! jire.ciiding 
observation. For lag 1, the model is of the form 

(1) == p(a:.'-i - Hi~i) + U{, i ~ 1, 2 • • , iV, 

where xd s x„ and fio = . In (1), the \x,} arc the N cbservatioiis; the fu,) 

are N random disturbances, each assumed normally and independently dis¬ 
tributed with zero mean and variance or*; the means (pd arc linear coinbiriationa 

of some of the iV functions of i: cos ^ and sin For odd, g » 0, 1, • ■ •, 

i(iV - 1); h = 1, , ^(fV - 1). For gyen, g = 0, 1, • • •, h » I • • • 

iiV - 1. Hence, ’ ’ 


( 2 ) 


f*. = E cos -H 2 sih 

i' jn h/ A' 


2irih' 


where g' and h' run over certain values of the ranges of g and h, respectively. 
Let K be the number of terms m (2). UsuaUythe constant term, a,, is included 

(in this case g = 0 and cos = 1). Of the N trigonometric functions available, 

areexcluded. M .hould be n„t«l that (1) 

Th^^sample «^te> of and ft. are the UBual regreaiion, of *, on 

cos and Bin respectively. Because of the orthogonality of these trig¬ 
onometric terms, the estimates are 



CIKCULA.R SEHIAL CORRELATION COEFFICIENT 


61 


(3) 


(Xgt 




If 

a:, cos 

l"«l 

If 

2 sin 
<—1 


/N 
N / 2’ 
2rih' /N 
N / 2’ 


g ^ Q, W, 



aiN 


= 2 


Xi cos irt 


i/iV = E (- lyx./ 


The fitted series is 

(4) 7R, = E «B' cos + E sin . 

ff' ly h* ly 

where the sums on g' and h' are over the ranges in (2). 

The serial correlation coefficient suitable for this model is 

If 

(x. - m.) (x,_i - m,_i) 

(5) R = -ri--, 

E (®* - 

where wo a mn . This statistic con be used to estimate p, or it can be used to 
test hypotheses about p. In fact, for the circular model this statistic leads to the 
best tests [3]. 

It is hoped that the mathematical model studied in this paper can be used in 
the treatment of certain problems in economic time series. For example, the 
seasonal variation in a series of data may be considered as a "systematic part" 
made up of trigonometric components. In the next section we discuss in a more 
detailed way how the use of this model may arise m the field of economics. 

We have considered circular serial correlation, although in most statistical 
problems it is non-circular serial correlation that is involved. The reason for 
treating the circular case is the inherent mathematical simplicity. The circular 
coefficient and Fourier series of the type (2) are “naturally” related. The relevant 
fact is that the vectors 



are characteristic vectors of the matrix of the quadratic form in {x{ — m,) of 
the numerator of R. For this reason the distribution and significance points 
are easily obtained. 

In the usual applications the circular coefficient can be used even if the hypoth¬ 
esis alternative to independence of observations is non-circular serial correla- 



62 


K. L. ANDEHSON AND T, W. ANDBR80N 


tion. The circular eoeflScient may not have as good power against non-circular 
alternatives as non-circular coefficients, such as 

y 

X) (i. - m,) (i.-i — 771,_i) 

(6) -y-- ■ 

(x{ - ni,y 

1-1 

However, the difference between these two statistics is (*1 -- 77tO (r*N " • n«w) / 
2(a;< - 771,)*, and it can be shoivn that this converges stochastically to sxro (m 
N increases and p remains fixed). 


2. The use of fitted Fourier series. A linear combination of trigonometric 
terms may be used as a regression function when there is a “systomatic part” 
(or “trend") that is periodic. For instance, it may be rcaaonnhle to asaurnft that 
a series of agricultural data has a systematic component with certainperimlicities 
due to variation in weather, Then one may ask whether this regrcflaion function 
"explains” all of the interrelations in the series. 

An example taken from agricultural economics is a development of that given 
by Koopmans [8]. Suppose p, and 5, are the price and supply, respectively, of a 
given farm product at time t. Let Qi*" be the quantity demimded at time, i if 
P( = P, and be the quantity supplied at time I if p,„t « P, where P in an 
arbitrarily selected point of reference on the price scale, serving to define the 
Q’b. Let the market equations be defined as follows: 

Pt — P — ~ Qi"**) -h , 

(8) «. - Q\‘^ = {(Pi-L - P) -f- (7, . 

where u and v are random disturbances. The first equation expresses the price 
depressmg tendency of an abnormally large supply; the second exprcasm the 
supply-stimulating influence of abnormally high prices L time units earlier (the 
tune between planning the product and selling it). We con substitute from (7) 
at time (t - L) into (8) and obtain 


(9) 


3( Qt'^ - piqt-L — Qi-i) d- Wt , 


which IS of thejorm (1) for general lag L (i - 1 is replaced by t - L) if 

w n '"tJ “ "'ish to test the null JiypotliPsia, 

JsMhA “su>ne that our alternative hypothesis is //„ : p > 0, we con 

Wlv W ^ distribution of R. Sirai- 

oTw cLf/ would use the negative tail of the distribution of R. In 

otfier cases, if we believe p 0, we might wish to estimate p, 

variatiof consider using the Fourier series for seasonal 

appropSte tebks indications of the 

“V® liypothesifl p = 0. Ca) 

Annual data. Here only a constant is fitted; this is the sample mean. The tabU 



riKCULAU BERlAIi COKBBLATION COEFFICIENT 


63 


^ven in [2] or [5] are to be used, (b) Semi-annual data. To “correct” for varia¬ 
tion of peririd two we fit a constant and cos vt - ( — 1)'. The table given in Sec¬ 
tion 3 for P “ 2 is to be used, (c) Quarterly data. The four terms to be fitted are 

1, C08 vl (—1)*, cos and sin ~ . The table given in Section 3 for P = 2 

and 4 is to be used, (d) Bimonthly data. The six terms to be fitted are 1, cos wt, cos 

-g , sin g , COB .g*, and sin . The table given in Section 3 for P * 2, 3, and 


6 is to be used, (c) Monthly data. The twelve terms to be fitted are 1, cos 


. vl xi 
sm -;r. cos ■ 


•wl 


(—I)*. The table given in Section 3 for P = 2,12/5, 3, 4, 6, and 12 is to be used. 
It is assumed here that the data are given for each time interval in a certain 
number of years. Then the residuals are the same as the residuals taken from 
means for each month or season. That is, if the data are monthly, one may com¬ 
pute the sample means for January, February, etc., and residuals are to be taken 
from the corresponding monthly means. The fitted Fourier coefficients are cer¬ 
tain linear functions of these means. 


3, Tables of significance points of R. 

3.1. Significance points of R using a seasonal trend for annual, semi-annual, bi¬ 
monthly, and monthly data. The calculations of significance points of R (lag 1 
only) have been subdivided according to the number of terms included in the 
estimating equations, mi . The significance points for only a constant in mi have 
been tabulated in [2] and [SJ. Since the main use for mi equations involving sine 
and cosine terms seems to be for semi-annual, quarterly, bimonthly, and monthly 
data, for which N is even, the results presented in this paper are for N 
even. Then we will have all of the sine and cosine terms in pairs except for cos 
irt ==■ ( —1)‘ and the constant term. We shall find it convenient to refer to the 
period P,< ” JV/p' or P*/ = N/h' of the terms in (2). 

We have calculated significance points R' exact to 3 decimal places, 
forPr(P > P'j ™ a « .01, ,05, .95, and .99. The values of R' correspond¬ 
ing to a «» .01 and ,05 are usually indicated os the positive significance points 
and those corresponding to a »« .95 and .99, the negative significance points. In 
all of these cases, except for annual data, the distribution of R is symmetrical. 
Hence only the positive significance points need be given, smee the negative 
points are simply the corresponding positive points with opposite sign; that is, 
R' (.95) = ~R' (.05), R' (.99) = ~R' (.01). 

The significance points were calculated from the exact distribution of B 
given in Section 5 for aU N up to the values where the approximate significance 
points using an Incomplete Beta distribution (Section 7) were the same as the 
exact significance points, The Incomplete Beta significance points were used 



64 


R. L. ANDEEBON AND T. Vf. ANDBIiSON 


up to the value of N for which a normal approximation wm satisfactory. For 
some of the results, the normal points became sufficiently accurate to he used 
following the exact points. 

The values of R' are given in Table 1 except for (a), for the following vahira 
olN: 

(a) Annual data —see the tables in [ 2 ] or [5]. 

(b) Semi-annual data {P = 2): iV’ = 6(2)00. The exact points were uccclecl 
for N through 10 (a = .05) and N through 22 (« == .01). Tin* normal points 
could be used for hf = 60 (a = .05) but were still too larR{! by .003 fnr/V 00 
(« = . 01 ) 

(c) Quarterly data (P = 2, 4): = 8(4)100. The exact points were needed 

for N through 20 (a = .05) and N through 32 (a - .01). The, lUtrmal pnintB 
were adequate for all N above 20 (a = .05) but were still ton large by .(K)l for 
N = 100 (a = .01). 

(d) Bimonthly data (P = 2, 3, 6 ): = 12(0)150. The exact points were needeil 

for N through 24 (a = .05) and N through 30 (« - .01). Again the normal 
points were adequate for all k above 24 (a = .05) but were at ill too large by .(KKIS 
for N = 150 (a = ,01). 

(e) Monthly data (P = 2, 12/5, 3, 4, 6 , 12): A = 24(12)3fK), The exact points 
were needed for iV = 24 (a = .05) and Af = 24, 36 (re .01) 'I'lic unriiml jioints 
were adequate for iV > 24 (a = ,05) and N > 300 (m ^ .01),^ 

Significance points for the Incomplete Beta approximation (Sec. Section 7) arc 
tabulated in terms of 2p and 2q. The values of 2p and 2// are the Kitme. when 
ui(R) = 0 , for (c), (d), and (e) above these values are simply // • - 3, jV •- 5 , 
and — 11, respectively. Hence, for two-tailed signifieaiicn poinlH fur these 
cases, the ordinary correlation tables can be used with A/ — 3, /V — 5, and //--'ll 
degrees of freedom, respectively. Also, our one-tailed significance points can be 
approximated by use of the 10 % and 2 % significance points for the ordinary 
correlation coefficient. 10%, 5%, 2%, 1%, and 0.1% two-tailcii sigiiilic.ance 
points have been tabulated by Fisher and Yates [0]. These sigriilicancc points 
are accurate to three decimal places for the serial correlation coefficients as 
follows 


(c) n = JV - 3 degrees of freedom: JV > 24 (a = .05); M > 30 (a - ■ .01), 

(d) n = jy — 5 degrees of freedom: N > 24 (« = .05); M > 30 {« -< ,01), 

(e) n = Zy - 11 degrees of freedom: AZ > 24 (a = .05 and « .01), where 

a is the one-tailed significance point. For semi-annual data (li), 2p i. 2f/ 

N‘ - 3N -j- 4 ^ 

- ^ ^ -, which IS not on integer for N > 12. When M -v- ]2, 2p « 


2g - 14, for which the ordinary correlation significance point is atlepimto 
for a = .05. 


< It should be noted that for (o), (d), and (e), an approximation given by Cochran [4] 
iB OMi y computed and is more accurate than the normal approximation for the a - .01 
Bignmcance points 

if ^ It 'issd in computing the ordinary correlation 

coefficient when the sample means are first subtracted 



riKCfLAK .SKHIAI. f’OHIlKLATION (;0EFPICIENT 


65 


Details of oomputitifi; tfchukiues u>sing tiui exact distribution are given by 
R. L. Anderson [1] for computing values of Ji’ when - 0. 

3.2, iS'fyni’^rartre points of R /or othn ningk-period trends. Significance points have 
also been obtaim^fi for P =• 3, P 4, P = fi, and P = 12, for which K' = 3. 


'J'ABLE 1 

K.rad .■rfy/u/iram;e points, R‘, for different fitted scries* 



/* - 2 



P ■» 2,4 



° “ 2,3, 

6 

P »«2,12/5,3,4,0,12 

’\or' 

If) 

01 

,VN,« 

.05 i 

.01 

.V\« 

,05 

01 

A\« 

.05 

01 

fi 1 

.495 

.499 

8 

.636 

.693 

12 

.592 

.744 

24 

.441 

.592 

8 

.484 

.607 

12 

.515 

.601 

18 

.442 

.592 

36 

.323 

.445 

10 

.453 

.601 

16 

.439 

..')82 

24 

.369 

.504 

48 

.267 

.371 

12 

.426 

.572 

20 

.388 

.523 

30 

.323 

.445 

GO 

.233 

.325 

14 . 

.402 

.544 

24 

.351 

.478 

36 

.291 

.403 

72 

.209 

.293 

16 

.382 

.519 

. 28 

.323 

.441 

42 

.267 

.371 

84 

.191 

.268 

18 

.364 

.496 

32 

.300 

.414 

48 

.248 

.346 

96 

.177 

.249 

20 

.348 

. 476 

30 

.282 

.391 

54 

.233 

.325 

108 

.166 

.234 

22 

.334 

.458 

40 

.207 

.371 

60 

.220 

.308 

120 

.157 

.221 

24 

.321 

.442 

44 

,254 

.354 

66 

.209 

.293 

132 

.149 

.210 

20 

.310 

.427 

48 

.243 

.338 

72 

.200 

.280 

144 

.142 

.200 

28 

.3(K) 

.414 

52 

.233 

.325 

78 

.191 

.268 

156 

.136 

.192 

30 

.200 

.402 

56 

.224 

.313 

84 

.184 

,258 

168 

.131 

.184 

32 

.282 

.390 

1 60 

.216 

.302 

90 

,177 

.249 

180 

.126 

.178 

34 

.274 

.380 

I 64 

.209 

.293 

90 

.172 

.241 

192 

.122 

.172 

30 

.26(1 

.370 

I 68 

.202 

.284 

102 

. 166 

.234 

204 

.118 

.166 

38 

.260 

.361 

1 72 

.197 

.276 

108 

.161 

.227 

210 

.115 

.162 

40 

.254 

.353 

1 

191 

.268 

114 

.157 

.221 

228 

.111 

.157 

42 

.248 

.345 

I 80 

. 186 

.261 

120 

.153 

.215 

240 

.108 

.153 

44 

.242 

.338 

; 84 

.182 

.255 

126 

.149 

210 

252 

.105 

.149 

46 

.237 

.331 

88 

.177 

.249 

132 

.145 

.205 

264 

.103 

.146 

48 

.233 

.324 

92 

.173 

.243 

138 

.142 

.200 

270 

.101 

.142 

50 

,228 

318 

96 

.170 

.238 

144 

.139 

. 196 

288 

,099 

.140 

52 

.224 

.313 

100 

, 166 

.234 

150 

.136 

.192 

300 

.097 

.136 

54 

.220 

.307 










56 

.216 

.302 

1 









58 

.212 

.297 










00 

,209 

.292 

1 

i 










* P = Periods Used in Fitted Series. 


In these cases, the distribution of R is asymmetrical. The Incomplete Beta 
approximation is symmetrical for P = 3, with 2p = 2g = JV — 2, even though 
the exact distribution ia not. 

The significance points for these single-period trends are given in Table 2. 




66 


R, L. ANDERSON AND T, W, ANDEHttDN 


The exact distribution was required to compute the a “ .01 ami ,09 .‘^ifNiifirance 
points for N through 48 in all cases and also for moat esaes with a ,0r> and 
.95. For N > 48, the Cochran approximation [4) ga\'e Ihe saiiitj rchulta fia the 
Incomplete Beta approximation. Since this Cochriui approximation can Ik* com. 
puted more rapidly, it should be used if other signiheancp pointH are desirrHi. 
The normal approximation is not recommenderl bt'rau.w, it i.s lesvK accurate than 
the Cochran approximation and requires almost as much calculation. For a s - .01 
and .99, the significance points using the normal approxitnation were frKi large 
(in absolute value) by from .0005 to .001 for the last entriw in 'rahle 2. T’he two- 

TABLE 2 

Exact significance points, R', for single periods > 2 


.06 

.01 

N 

.496 

.500 


.475 

.619 

12 

.392 

.526 

18 

.340 

.463 

24 

.304 

.417 

30 

.277 

.382 

36 

,256 

.356 

42 

.240 

.334 

48 

.226 

.316 

54 

.214 

.300 

60 

,204 

.286 

66 

.195 

.274 

72 

.187 

.263 

78 

.181 

.254 

84 

.175 

.245 

90 

.169 

.237 

90 

.164 

.230 

102 

.159 

.224 

108 

155 

.218 

114 

.151 

.212 

120 


,651 j 
.509 
.427 
.273 ! 
.335 
.30t5 
.283 1 
.264 I 
.248 
.235 i 
,224 j 
' .214 ! 


.296 .5fHl 
.277 .440 

.254 .393 

.230 .359 

.220 I .332 
.207 j .311 


.197 .294 

.188 . 279 

.180 : .266 
.m .255 
.107 .246 

.161 .237 

.15t) .229 

.151 .222 

.147 .216 

.143 .210 

.140 ,205 

.137 .2tX) 







a? 


« HU l I \K iir-UUI, rtmHKLATION COEFFICIKN’T 


1 ABI,K 2 ' (Umtinnrd 

P - 12 


.Y 


it 



A 


a 






tw 



.00 

.0,5 

.05 

.01 

H 

- ,.HKP 

. 7US 

.503 

.(j37 

12 

--.778 

-.071 

,090 

.245 

12 

.ft.. 

.tiOS 

.420 

.m 

24 

“. .555 

-.444 

.197 

.330 

in 

, ,i'.43 

.502 

.300 

.522 

30 

-.447 

-.348 

.188 

.298 

20 

■ .070 

141 

.333 

.474 

48 

- .383 

-.293 

.175 

.270 

24 

-■ 510 

. 300 

,3(r( 

. 437 

no 

-.3.39 

— .257 

.103 

.249 

28 

• -,477 

..'fid 

.2K5 

. 107 

72 

-.307 

-.231 

.153 

.231 

32 

■ .445 

• .331 

.20K 

.383 

8-1 

- .283 

-.212 

. 14.5 

.217 

30 

■ .418 

.312 

.2.53 

.3(53 

tK, 

-.203 

-. 100 

.138 

.20(5 

40 

- .305 

- .203 

241 

.345 

108 ’ 

-.247 

-.183 

.132 

195 

44 

,37.5 

. -ir t 

.230 

.330 

120 i 

-.233 

-. 173 

,120 

,187 

48 

^,358 

.201 

,221 

.317 ] 132 i 

-.221 

-.104 

.121 

.180 

52 

-.343 

.2.52 

.213 

.305 

144 ' 

“ ,211 

-.1,50 

.117 

.173 

50 

,330 

.212 

,200 

.204 

1.50 ! 

-- .2(72 

-.149 

.113 

.107 

I’lO 

• .310 

.233 

.100 

.28.5 

108 

--.194 

-. 143 

.110 

. 102 

04 

.3tlh 

225 

.103 

.277 

m \ 

-.187 

- .138 

.107 

,157 

08 

.208 

■ .2IK 

,lHh 

,200 

102 * 

-.181 

- . 133 

.104 

. 153 

72 

- .28‘l 

' .21! 

.1H3 

.202 

20-1 ] 

-. 175 

-.128 

.101 

.149 

70 


,20.5 

.178 

.25.5 

210 j 

-.170 

- 124 

.099 

.145 

80 

- .274 

- . ioy 

.174 

.240 

228 i 

-.105 

-.121 

.097 

.141 

84 

- .207 

‘ . 1!!4 

.170 

.243 

240 j 

-.101 

-.117 

.094 

.138 

88 

-.201 

• . 180 

.100 

.238 

252 j 

-. 157 

-.114 

.092 

.135 

1)2 

“ .255 

■.IHl 

.102 

.233 1 20-1 ! 

-.1.53 

-.111 

,091 

.132 

90 

•™.2!0 

• .18(1 

.150 

.228 

270 i 

- . 149 

-.109 

.089 

.130 

100 

•-•,244 

-• .170 

. 150 

.223 

288 j 

- . 140 

-.100 

.087 

.127 

108 

- .234 

• .100 

.150 

.215 

300 ’ 

-.143 

-.104 

.080 

,125 

120 

- .221 

-•.ItiO 

.143 , 

.205 

! 





132 

- .210 

"■.152 

. 130 

.UM5 

i 





144 

~“.20l 

"‘.145 

.131 , 

.187 

1 






tailed aipiniticarH'f [iciinis etmntd l«5 obtaincrl from the, ordinary correlation 
tables except for /* <'« 3. 

3.3. Exampk e/ iwe o/ mgniftmnce poinln. Ah an example of the use of these 
aipuliearu'e poinli*. H\ we ahali eonaulcr the following data (17) on the receipts 
of butter (in »nit« of i .(KKl.CKKl pounds) at five marketa (Bo.ston, Chicago, San 
Francisco, Milwaukee, and St. Louis). The figures in parentheses are deviations 
from the average nf the given months over the 3 years. 




68 


R. L. ANDERSON AND T- W. ANDEHflON 




Year 


Total 

1 Averagp 

1 


1935 

1936 

19.37 

r. 

Jan. 

48.9(2.4) 

48,3(1.8) 

42.4(-4.n| 

139.0 

I 40.5 

Feb. 

43,4(-0.6) 

47.1(3.1) 

41.4 (-2.0) 

131.9 

44,0 

March 

43.8(-4.6) 

52.4(4.0) 

49.0(0,6) 

145.2 

48.4 

Apnl 

50.8(-1.5) 

55.3(3.0) 


150.9 

52.3 

May 

67.6(1.6) 

64.7(-1.3) 

05.8 (-0.2) 

19H.1 

00.9 

June 

83.7(0.7) 

79.6(-3.5) 

85.9(2,9) 

249.1 

8:1.0 

July 

82,7(10 7) 

62.6{-9.4) 

70,6(-I.4), 

215.9 

72,0 

Aug. 

60.8(4.8) 

51.3(~4.7) 

55.8(-0.2) 

107.9 

56.0 

Sept. 

55.4(3.6) 

51.0(-0.8) 

49.1(-2.7) 

155.5 

51.8 

Oct. 

48,4(-1.0) 

54.0(4.6) 

45.7(-3.7) 

148.1 

49.4 

Nov. 

37.7(-4.5) 

45.2(3.0) 

43.8(1.0) ' 

126.7 

42.2 

Dec. 

41,0(-3.2) 

44.9(0.7) 

40.7(2.5) 

132.0 

44.2 

Total 

664.2(8.4) 

656.3(0.5) 

6-17.0(-8.S)- 

1967,5 

055.8 

Average 

55 35(0.70) 

54.69(0.04) 

53.92 i 
(-0.73) * 

103.96 

1 

54.65 


We assume that the trend is composed of the 12 Utidh Imvins tlmt 

divide 12. We shall test the null hypothesis that the devialions from the* trend 
are independently distributed against the alternative that there is positive 
serial correlation. The fitted series is of the form 


( 10 ) 


m, 


— to + 




-1 cos + 65/ sin 
o 6 




+ 6n eo« TTt; 


here we find it convenient to use the notation, 6? , tf . • • • , 6* , for the coef- 
cienta (with a Merent relationship between the, subscripts and the trigono- 
me ric unc ions an in (4)). We find that the m.< are simply the average receipts 
^ven for each month m the above table (46.5, 44.0, • ■ • , .^.2). Hence the devia- 

1 , X parentheses (2.4, -0.0, • • • , 2.5). The 

calculated lag 1 circular serial correlation coefificient is 




( 11 ) 


232.18 

474.51 


: 0 ^) + (- 0 . 6 )(- 4 . 6 ) + .. 
~ (2.4)2 + (~0.C)» + 

= 0.489. 


+ (1.6 ) (2.5) + (2.5)(2.4) 


Dull hypothesis of zero siml correktionT"- 

alternative single-tail hypothesis o > 0) Tf be rejected (against the 

P , p > 0). If we had been interested in, the two- 




fllirrjAK .SWiUL ('(mHEt,ATION roKFFiriE.N'T 


69 


tailed alternafivc hypnlheHif-, r fK "‘f would uw* thp ordinary correlation tables 
with iV — 11 - 26 ih'ftrfH'S of frwlom and we would find that for the two-tailed 
test R' (.01) fl.'lHT. dur value is Fifcnificaut at the 5% level and barely signifi¬ 
cant at the i% level. 

The vahn^s of h* m HtM are aa follow.^ 

h’ 1 7\m, 

I-? 


( 12 ) 



7'.n.7/ 


39 

r, 

/ 

u* 

'Ui - 

> IK, 


i! 


7\ I'fW ri/3f‘i 


The computed values of t* Nt fin are fit Ofi, —14.82, —2.02, fi.GOj 1.23, —3.98, 
0,30 2,21, 1.73, O.tli. ft Ml, re-^tMTttvrdy. However, it is not necessary to 
comjmte tlie«e 'aliie', in ..rder to olitaiu m,. The prolilcm of estimating the 
variances of thef-e hV will h“ iliM-nssed in SisUuin 4. 


4. Testing the hypothe.«iis of kck of serial correlation. 

4.1. k'<Ui(rm^itl Ilf ffic pmhUm. t onsider the N random variables ui, ■ " , «w, 
each normally ainl )ndi'|,M*nde)!tly dwtrilmUsi with mean 0 and variance <r\ 
Define the .V vanuMe; jr, , • h.v the eciualions 


(13) e'* ' pC.f. i, ■ (i*l. “■ihf), 

where 

(l.j) X., ^ Xa M-i ^ (y =• 0, 1) ' ‘ , iV" — 1) 

and fit is the linear coinbutalnm of trigonometric functions given in (2). If L 
and Rf are relatively prime (in particular, if X, « I), the Jacobian of the trans¬ 
formation from I'l.; to !x,I w I - p", and the probability density of {a;,') is 


(lf«) 


1 - p* 


A ^ 

where Q • (I I p"! £ 'A *•' ^ 

the covariance Is-tweeu/. wid x, la + p''”''“^V[(l •“ p'^)(l ~ 

h - m atid N . p«. «hm* p. q. and « are positive integers and g and p are 

relatively prime, then the Jacobian is (1 ~ p')" and the density of {x.) is 


(1 - /)“ 




(10) 



70 


K. L. ANDERSON AND T. W, ANDEHKaN 


We shall now obtain the likelihood ratio test of the hyimthesi-. /f,-, : p t» on 
the basis of a sample consisting of one observation on eaeh r,. 

4.2. Preliminary Iransformaiions. We shall find it corn cnient tci p, in 

terms of fixed variates tt>>i, having certain properties. Later we wd! verify ulint 
the <t>’B are simply constant multiples of the trigonometrie terms m '2). We sup. 
pose now that 

(17) M. = 2 4>^>yi - L ■ • • ».Vl, 


where K' < N, the (yj) are parameters, and the if>„ an* known fuuetions of ? ami 
j satisfying 

(18) = 2Xt/(ti„ (f =» 1, • * ■ , jV; 1, ■ • • , A''), 

tt 

(19) * Sji (/, k ■■■ I, ■ • • , A*'), 

1—1 

(20) j = (t D, I, * • * . A' • 1), 

and Sji, is the Kronecker delta, Let 


( 21 ) 

where 

( 22 ) 


X* 

rHx = ^ 4>ij Oj, 
1-1 


ct 


w 

ipif . 


Then by usual regression theory we have 

N 

(23) 22 (at. - = 0, 

»—I 

(24) i; (., - „.)•. (,. - „,)> + £(„,- ,,,)■ 

*“*• <-1 )-i 

because c, is the least squares estimate of 7 /. Let us evaluate 

= 52 (a:, - Hi){xx-L — p<_i) 

«—1 

N 

— ^ [(aij — m,) -h (wti v M.)][(®i~x, — nti-t) -f- (m.^p 

~ S ~ wi<)(a;(-p - m,_t) + 22 52 — 7 /)(x, — m.) 

«~i 1—1 
ir K' 

+ 52 ^ - 7y)(a;,-i, ~ m.-t) 

•—1 i—l 

tf z 

+ 52 X <#'<k<^—t,,(c* — 7k)(c/ — 7/). 

•"1 


( 25 ) 



nKfr-iAK Hr.mxh rnmrj.xrios cubifficikn-t 


71 


(\11 tli(‘ iir.-t fcriii on flio right hand t.Klc> of (2,7) ,,C, In view of (20) the next two 
t(>rni« art* 


(2G) 


Xj Z-> (a — “{■ <^1W.,;)(C( — 7i). 

1 "( >"*1 


Thi« in wen to ho »‘n> Uy rotisideratitm of (18) and (2(i). The last term can be 
written 

(27) ^ t.) -j' 7t)(c> ~ 7j) = — yjY 

by use of (181 fFJ3. and (20). mis 

r' 

(28) lO ■ ^ (x, “■ Wt)(xi~t ni,_i,) -f" X£.;(Cj — 7j)*- 

»-i 

It follows that 


(29) 


(1 }■ p) Y^ix, -- nt,)’ - 2p Y ix, ~ m<)(a:,-t ~ m.-J 

+ (I + p’ ~ 2 p\i,;)(c; — yjf. 


W(‘ can coinidctc the matrix $ « (<;((.,) so that * is an JM-th ordersquare matrix 
with elements satisfying (18), (19), and (20). If we make the transformation 

K 

(30) x,^Y<Pi,c, (t = 1, ... 

then 

(31) i:(x.-m.0‘- L 4, 

(32) Y (x, “ m,){x,-L — m,„ii) == Y . 

• -1 )“*'+! 

4,3. T)w likelihood mlio crilmon. To obtain the likelihood ratio test of the 
hypothesis Ht ,; p »■ 0 against alternative h 3 q)othe 8 es H,-. p 0, we divide the 
maximum of the. likeliluMid aaauming Hti by the maximum of the likelihood as- 
etiming //«. It is clear from (16) and (29) that if IIo is true, the maximum like- 
liluHRi eatimatea of yj and tr® are C/ and 

(33) Sfl * ^ I!) (x« - 

jN <~i 

respectively. If ih Is true, the maximum likelihood estimate of y,- iscy. Testate 
the maximum likelihood estimates of p® and p under 77* it is convenient to define 
tK, the sample serial coefficient of lag L, as 

1 ^ 

t,R »» rfi Y (®< mtXxt-i — itii-l). 
jyso <~j 


( 34 ) 



72 


B, L. ANDERSON AND T. W. AN'ttKRJ^fIN 


Then the maximum likelihood estimate of s under //„ 

(35) 8® = 8o(l ■¥ fi* - 2^t.R), 

where p is the maximum likelihood estimate of p and satisfit*« 

(36) ^(1 + /) - Ml + - f), 

if L and N are relatively prime and satisfies 

(37) zRd + M) - Ml + M'") » 0, 

ii L = qa, N = pa, and p and q are relatively prime. 

Upon substituting these estimates into the likelihood funetion wtt fiiul that 
the likelihood ratio criterion is 

, _(l + p’-2p. «)>•■' 


1 - p'' 


if L and N are relatively prime and 


■(1 + p^ - 2p^fi>)<'’T 


if L = ga, fV = pa and p and g are relatively prime. 'I'lie maximum likelilumd 
estimate of p is the root of (36) or (37) that inake.s (331 m (39), reHjKa'tively, a 
minimum. It should be noticed that throughout ihm wetion p etnthl he replimeil 
by 1/p (and changing by a factor 1 + p’). To make the maximum hkelilmixl 
estimate unique, we require that [ ^ [ < 1. It can he Hhonn that there. e-xiHte one 
and only one root of (36) or (37) that satisfies this re([uirenieiit, atid miniimxe.H 
X. (There is a peculiarity to this solution in that if AT is (xld, L 1 , and tR < 
-1 -f 2/Ar, then p = -1 is the root mmiraiziiig X). In any ease, X is a function 
of ifl, We have shown that for 0 < lR < 1, it is a monotonic decreasing fune- 
tion; and for — 1 < lR < 0, it is a raonotonic increasing function. A eritieal 
region defined by X < X, can, therefore, be defined by /.R < Hi < 0 and 0 < Ri < 
tR, (The probability that Jt = — 1 or -f-l is 0.) Thus we can use tJi to teat the 
null hypothesis ffg ; p = 0 instead of the likelihood ratio criterion (against on«‘- 
sided alternatives they are equivalent). The strongest justification for the use 
of lR in testing ifo : p = 0 is that for circular diatributions the uniformly most 
powerful tests against one-sided alternatives and the B, feat agairmt two-sidctl 
alternatives are given in terms of inequalities on tR ( 3 ]. 

^ ^ estimate of p. In fact, Ji is a«ymptotsca»y a, root of 

IS proved by showing that iR(l + Jt") ~ ;,R (1 4 - 

r; zero. We shall use ,R both to «LU 

mate p and to test hypotheses about this parameter." 

ind!cairin^il.r"^T t" trigonometrie terras 

indicated m Section 1. In the rest of the paper we shall let the index ff run from 

negl^ting^hrJaciIbTa7b(?5j^ maximum likalihood eatimate for p, a conaUnt by 



SERIAI, CORRELATION COEFFICIENT 


73 


0 to \N for N ovm and from 0 to - 1) for N odd; we let the index h run 
from 1 t<t 5.V - I for .V oven and from 1 to - 1) for N odd. We shall use a 
prime to denote an index running over thoee values corresponding to fitted terms 
and a double prime fo denotiMin index running over those values corresponding 
to terms not fitted. 

l,et the N trigononu-tric funetiona oft, namely cos and sin benum- 

N N 

bered from 1 to -V such that the fittetl terms are numbered from 1 to K' and the 
non-fitted terms from K' + 1 to N. According to this numbering we define to 
<t>tN as 



Defined this way, flu* .«af iafy (18) and (19) and (20). It can be shown by using 
the additi<m formulas for sines and ciwines that 


(42) 


X 




COS 


2irLf 


where f -■ g nt f ■ h deiKuiding on whether j refers to a term (40) or (41). Wo 
slmll assume that tin* numbering of trigonometric functions is such that 

(43) X/..x'+i ^ X/„4r'+» S ^ . 

It can easily be stwii that (2) is of the form (17) except thata,,* and /?*- mus^ 
be multiplied by unlens y' 0 or and byv/A" for g' = 0, to obtain 

7 j The regre.Hsion laadliew'nts a,* and are similarly related to the c, . 

It ean be seen from (29) that the a/ and b/ are independently distributed with 

variance j -f p* — 2p cos fo>*/ ^ 0. and variance Af(rV(l — p)' 

for / = 0 and for / ^A' if h is even and AfcrV(l + p)* for / = if i is odd. 

In these variance formulas we can estimate p* from (36) using tB for fi and p. 


6. The exact distribution of Ji. 


5.1. Introdwtion, I’mler the null hypothesis i/o i p «= 0 the observations (a;,) 
are normally and independently distributed with variance / and means Bxi = p,. 
The variables C) deruu*d by (22) and (29) are normally and independently dis¬ 
tributed with variance and means y,-. For j > K', y, = 0. It follows from 
(31), (32), (33), and (34) that 






>-*'+! 


( 44 ) 



74 


n. L. ANDERSON AND T. W. ANDERHON 


where the Al,, are given by (42) corresponding to the A'" ~ (.V - A") trigo¬ 
nometric terms not fitted. Thus to obtain the disfriinitiun of Ji we iirt'fl only 
consider the joint distribution of {c,],j = K' + 1, ■ • * , A , If //« is true, the 
joint density of all the Cj is (15), where 

(45) Q = (1 -f p*)F ~ 2piC' ^ (l + p* 2pXt,)(r, 




and 


7 = E c" and tC == 23 

j-A'H-l ;-/C'+l 


5.2. Some special distribulions of iR = R, If the constant term ig' »=■ 0) is 


fitted and the other terms are fitted in pairs ^cos and sin 

is odd. If iV is odd, then K" is even; the Xi/ occur in pairs and we can define 
Xt as 


^irif 


2rif\ 

^ j. 


(46) 


Xi,r'-t-i — Xi,r/+j — Xi > Xi,jc»+j = Xi.jti 


Xs > 


^ Xi,.v-i Xiv X(jt. 


This also holds if N is even and if, in addition to the constant (crin and paired 
cosines and sines, we fit cos n = (-!)'((?' = 17/2). If N is even and wo do not 
fit cos «, we have K" odd. Then 


(47) 


Xi,K'+l — Xi,*>+J — Xi > Xl,jc»44 = Xl,K< 


-fi 


Xj > 




— Xi.if-1 = Xj(i:./-|) > Xw =* XVt*:"+i> —L 

The general expression for the distribution of R in these cases has been found 
by one of the authors [2]. In this case the cumulative distribution function is 
1 minus 


Pr{R > R'] = g I 7, l(x" _ 

Cl ^R' < \Z, 

where 7i is found from a result of Lehmann [9] to be 
2 K«'+i) 

Bin _ 

N 


(48) 


(49) 


N 


sm Bin 


Xy)) 


where f is such that Xi' = cos ^ and the product onf is over the K' terms 

i K'rSdk'J a?’ ~ 1 values in K^' ~ D pairs 

also fTm ^ ~ ^^ -1 if is even. W© can 



CIBCULAB SERIAL CORBELATION COEFFICIENT 


75 


7* = 


(50) 


N 


. 2ir/ 

sm am 


N 


i°^'n /.m ’-(«' +n ,fa 






am 


i7 N 

5.3. Some special dislribulions of iM for L > 1. We have noted in (44) above 
that \i,j = cos where/" corresponds to a term not used in the estimation 

equations for mv, which was a function of |cos > sin 

relatively prime to W, the distribution is the same as that given above for L = 1, 
except for the re-evaluating of the Xt . In the article by R. L. Anderson [2], 
where only the constant term in m, was used, the Xt for lag L were exactly the 
same as the X/t for lag 1, However, this will not be the case for other term.s 
used in . For example, consider lag 2 and 2V odd with m, consisting of the con¬ 
stant term plus terms in cos ~ and sin ^ . In this case the A*' for lag 1 are 

N N 


iir Gir 
cos , cos ^ , 


, cos 

N 


(W - l)n 
N 

Gtt 

'W 


and the x" for lag 2 are 
(N - Dtt' 


8t 

W’ 


, cos 


N 


Next suppose the highest common factor of L and iV is a (as befdre, L = qa 
and N = pa, with p and q relatively prime). In this case 


(51) 


At., = cos 


2irqf 


4/^ 


Since p and q are relatively prime, the results are the same as for q replaced by 1 
and L replaced by a. Each root is repeated a times. 

N = 2L(p = 2) 

If we let N = 2L, Xh = cos rk = -j-l or — 1. X" = -f-l corresponds to these 
fitted terms in m,: 'll, cos , sin for g', h' even, x!' == — 1 corresponds 

Sirt'g' 27rih') , 


to these terms: ^cos , sin —for g', h' odd. Let L — rti be the number 


N 


of terms pertaining to X = +1 and L — nz be the number of terms for 
x" = —1. Then, as in [2], we have the density 

(62) 


n. i, N _ (1 - .i22)*^"*“’‘'(l + 


where Jti was the notation used for lag L and p = 2. The cumulative function 
is the Incomplete Beta function, found by setting a: = ^(1 — 22'). 

N = 3L(p = 3) 



76 


R. L. ANDERSON AND T. W. ANDERSON 


If we let N = 3L, Xil' = cos = +1, -i- The fitted terms in mt corre- 

N 

sponding to x" = 1 are |l, cos , sin for g', h' = Sm. Similarly, those 

corresponding to x" = — ^ have g\ h' = 3m ~ 1 or 3m — 2, Let the number 
of fitted terms with x" = +1 be L — «i and with x' ~ ~ ^ be ZL — ih Then 


(63) 


p)- (1 - 


where iSs > - J. This cumulative function is also an Incomplete Beta function, 
found by setting a; = 2(1 — R')/3. 


JV = 4L(p = 4) 

If N = iL, x(' = cos = +1,0, —1. The fitted terms in m, corroaponding 

to X" = 1 have/" = 4m, those for x" = ~1 have /" = 4m — 2; and those 
for = 0 have/" = 4m — 1 or 4m — 3. Liet the number of terras in m, of each 
sort be 1/ — ni j L — nj, and 2L — ns, respectively. Then 


(64) D(R) = c 


(1 + ^ (1 y)Jt"i-n 

Jp^O 

• [(1 - fi) - y(l + dy, 

(1 - f'^ (1 - 

vv -*0 


for i? < 0, 


i • 1(1 + R) -3/(1 - dy, for fl > 0, 

where B is andc = r(KTii + nj + n,])/[r(^ni)r(in,)r(in 3 ) 2 ‘''‘'+"«"^’]. 

5A. The exact distribution of ijt when p 9 ^ 0 . The joint distribution of the ob- 
^rvations for lag 1 when the null hypothesis is not true (p 9^ 0) is (16), where 
Q ^^ven by (46) with L=l^d,C = RV. V, B, (c,) O’ == 1, ... , K') are a 
sumcient set of statistics for estimating <r*, p, and {y/j 0 == 1 ■ ■ ■ K') Using 

the results given by Madow [11], it can be shown that the Simultaneous dis¬ 
tribution of V and R is 


( 55 ) 


1 - 


2‘^''r(^x^') 4 /n (1 

r 


H- p* — 2pXy.) 




where D{R) is the density function corresponding to (48), InteKrating V from 0 
to 00 , we obtain as the density for fl inwgratmg V from 0 


(56) 


(1 - p’^XiK" - 1 ) 




(1 + P* - ZpXly) 


(1 + P* - 2pR)*^" 


■ g (-l)*^^(Xi' - I I ^ 



CIRCULAH SERIAIi CORRELATION COEFFICIENT 


77 


for < R < , where V* are given by (60). In the same way, one obtains 

the distribution of lR for p 5 ^ 0 when N = 2L, N = 3L, and N ~ iLhy multi¬ 
plying (62), (53), and (54), respectively, by 

(1 -h P* - 2pR)'^" 


(57) 


(1 - p^y 


/n. 


(1 -}- p* — 2pXi,y/) 


where K" = ni + tis or ni -f- nj -f- «a. This method was used by Madow for 
residuals from the sample mean [12]. 


6. Moments. 


6.1. The exact moments of R. Moat of the results of this section are straight¬ 
forward adaptations of earlier results for the case of yi, constant. Hence, we shall 
omit the details of derivations. The moment generating function of V and C 
for y = 1 is 


(58) 


(t>(k, i) = E{e 




) = 


1 - p" 


V r ih 

n 1 — 2to 2(p -f- 1) 

i"-jc'+i L 


The h moment of iZ = C/V is given by 

(59) pi(R) = / / ••• / -J dkUdy., 

V—ao J—00 J«~>bo uV* 


with the (j/,) restricted from being too large (not more than a certain amount 
larger than zero). In the case of independence, (p = 0), we have the following 
first two moments of R : 


(60) 


Mi (12) — 

it ji'-r'+i 


Pi 


liR) = 


K" {K" + 2) 


S ^i.j" + 


K" 


K" + 2 


U{R)r. 


If the Xi,y" are symmetrical (i.e. for each Xi,,.., there is a Xi,i-i = — Xi,,--), the 
mean of 12 is 0. For example, if 1 and (—1)' are fitted for N even, the mean is 0. 

6.2. Approximate moments of R when p = 0. Since 12 and V are independent 
[8] when p = 0, p'(R) = p'(C)/m'(F). F is a sum of squares and its momenta are 
the same as for with N — K' K” degrees of freedom. Using methods similar 
to those given by Dixon [6], we see that the moment generating function for 
G is 


(61) - a(0 • m ' 7(0, 

where 

(62) «(0 = = A^/[A^ - (20''], 

y(t) = n,, (1 — 2i Xi,,r), and A = 1 -)- Vl — 4t*- 



78 


K. L. ANDERSON AND T. W. ANDERSON 


(63) 


$(i) = oi(t)-y(i) - 


la this case, Kr = cos ^ includes all K' terms corresponding to those in m,. 
Since the first N derivatives of |9(0 are zero at < = 0, we can use 

(14- \/i 

as an approximation to (61). This expression yields the exact ttuiments of C 
up to order N. 

As a special case, consider = 3, with Xi.i == 1 and Xi.s = Xi.a =“ COT - • 
In this case 


(64) 


^3(t) = - 2t $l(f). 


Successive derivatives of (64) at t = 0 show that 

(65) g^CJJa) = [P ~ m cos uLiiRi) , 

where P = w(7i)/w(7,) = {N - S + 2X)/(i7 - 3), Q “ aL.(r,)/A*U^^) « 
2/{N — 3), and h = 1, 2, • • • , A". 

6.3. Afproximaiemomentgeneralingfunction of C and V wfien p 0. Tt) obtain 
an approximate moment generating function for C and V when p 0, we utilize 
an approximation method given by Leipnik [10], The exact moment generating 
function (58) with <r° = 1 can be written as 

(66) <#i(<o,«) = (1 - p'^)eexp.|- i Slogj^l + p“ - 2«()- 2(p + t) coa | > 

where d + p'* - 2ti — 2(p + <)^ij']*i and/ refers to the K' fitted terms 

in m<. If the sum in the exponent of (66) is replaced by 

(67) ^ log 1^1 + p® - 2/o — 2(p + t) cos dx, 

and if (1 — p^) is replaced by 1, we obtain the approximate moment generating 
function 


(68) 5 = _ Hj- [1 + p’^ — 2<o — 2(p + () 

[K1 + / - 21o + V(T+7»“~2 (o)' - 4(p''+ 

7. Approximate distributions of R. 

7.1. The Pearson Type I (Incomplete Bela) distribution, The significance points 
of iP can be found exactly from equation (48) for L = X and by integrating 
equations (52), (53), and (54) for N = 2L, 3L, and iL, respectively. These exact 
probability integrals for N = 2L, 3Z,, and 4L are simply sums of Incomplete 
Beta functions, and the significance points can be found in Pearson's Tables of 



CIRCULAR SERIAL CORRELATION COEFFICIENT 


79 


the IncompUto Bela-Fundion [14] or in the Thompson tables [16], However, the 
computation of the exact significance points for L = 1 and A'" > 4 by use of 
equation (48) is quite tedious and actually impossible for large N with present 
logarithm tables and readily available computing devices. Hence, approximate 
distributions are called for. 

The Type I approximation to the distribution of R is 


(69) 


/i(2i:) = 


(1 + RY''^ (1 - RY"^ 

2»+«-i /3(2J, q) 




where p and q are chosen so that the first two moments of this approximate dis¬ 
tribution agree with the first two moments of the exact distribution. It can be 
shown that each moment of the approximate distribution approaches the corre¬ 
sponding exact moment quite rapidly as N increases. On the basis of the ap¬ 
proximation, the probability a of the significance point R' being exceeded can 
be found from the Incomplete Beta function. Thus 


(70) a = Pr{R > 12') == 1 - /,(p, q) = /.,(p', g'), 

where 


(71) 7x(p, g) = (1 - yy-' dy, 

and a: = (1 -f R')/2, x' ~ (1 — x), p' — q, and q' = p. Hence, 12' = 2x — 1 « 
1 - 2 ®'. 

The parameters in (69) are taken to be 

(72J 2p = (H-m 0(1 - 2g = (1 - m0(1 - 

where pj = — (in)* and pt - n^iR) given in (60). Hence, when the distribu¬ 

tion of R is symmetric, mi = 0 and 2p = 2g = (1 — m0/m2 • 

In Section 3.1, we set up significance pomts for four special trends for which 
mI = 0: 

(b) P = 2; (c) P = 2, 4; (d) P = 2, 3, 6; (e) P = 2, 12/5, 3, 4, 6, 12. 
The values of mz for these four trends are: 

(b) (N - mNiN - 2)], (c) 1/(N - 2), (d) 1/{N - 4), (e) 1/{N ~ 10), 
Naturally the third moments for these symmetric distributions are 0. The fourth 
moments are as follows: 


Trend 

Cb) 1 

(o) 

(d) 

(fi) 

1 

Exact 


8<W« -IN- 16) 

3 

3 

(N -I- 4) (N + 2)N(1V - 2) 

W + 2)W)(N-2)W-4) 

(N - 2) W ~ 4) 


laeomplete 

aw-4)' 

3 

3 

3 


NIN- 2 ) {N‘ - 8) 

N{N - 2) 

W~2)(N-4) 

W-8)W~10) 


We note that for (d) and (e), the fourth moments for the Incomplete Beta are 
exact and for (b) and (c), they approach the exact values quite rapidly as N 
increases. 








80 


B. L. ANDERSON AND T. W. ANDERSON 


In Section 3 2 we considered some significance pcnnts for the following single- 
period trends: P = 3,4, 6. and 12 The values of 2v and 2? for these asymmetrical 
cases are 


(73) 


(jV - 4 - 2.\)E „ (jV - 2 -f 2\)E 

2p = ^5-; 25 = --, 


whereX = cos P = (A^ - 1 )(^ - 4)-4X and D = (.V 3)W - 1 -H 4X) - 

{N - 1)(1 + 2X).' 

Equation (69) has the drawback of using the range (-U +1) inslearl uf tlu* 
true range of R, which varies between the last (smallest) X* to the lirat (largest) 

2?r£ . 2»rf , 

Xi'. For example, if Af = 12 and we fit the constant, cos . “^“'0 sin then 
Xi,i = 1, Xi ,2 = Xi ,3 = cos ^ ^ , and the range of R is 1, cos 


However, if we fit the constant and cos in = (~ 1)*, then Xu = 1 and Xi * -“1, 

the true range would be ^^’rom these examples we wve that the. 


error m using the approximate range ( — 1, -f 1) varies according to the fitted 
terms m m,', and that the error is worse on one tail than on the. other, unless 
symmetric terms are fitted A more accurate approximation could he ohlamed 
by use of the exact curtailed range, but it was not thought desirable Is'cause th(' 
exact range rapidly approaches the approximate range os N im'reases, 

We might add that the significance point, /?', can also ho ealculal ed from the 
Inverted Beta (F) distribution, for which tables are given by Merrington and 
Thompson [13], Snedecor [15], and Fisher and Yates [6]. Cochran [4] haa provided 
an approximate formula for z = ^ log.P when ni and nj are not given in the 
P-tables. 

7.2 The normal approximation It should be noted that R is asymptotically 
normally distributed for p = 0, as shown by the form of the characteristic func¬ 
tion. We have considered the normal approximation with mean {R) and 
variance ni (P) The variance of R was given in the previous section for the four 
special trends. For all single period trends, except P = 2, = -(1 + 2X)/ 

{N ~ 3) and the variance is 


(74) 


t^2 


(AT — 1 ■+• 4 X) , 
(W-l)(Ar-3) 


where, as before, X = cos (2ir/P). Further terms in an asymptotic expansion of 
the distribution would take account of higher moments of /£ as Hsu haa done 
for the case of fitting only the mean {mi = a constant) [7J. 


REFERENCES 

11] R. L Anderson, Serial correlation in the analysis of lime senes, unpublished theaiB, 

Library, Iowa State College, 1941. 

12] R L Anderson, "Distribution of the serial correlation coefficient,” Annals of Malh. 

Slat , Vol 13 (1942), pp, I- 13 , 



CIRGUIiAR SERIAL CORRELATION COEFFICIENT 


81 


[3] T. W. Anderson, “On the theory of testing serial correlation,** Skandinamak Afc/«art- 

elidakrifl, Vol 31 (1948), pp. 88-116. 

[4] W. G. Cochran, "Note on an approximate formula for the significance levels of z," 

Annals of Math. Slal., Vol. 11 (1940), pp 93-95 
(61 W. J. Dixon, "Further contributions to the problem of aerial correlation,” Annals of 
Math. Slal,, Vol. 16 (1944), pp. 119-144. 

[6] R A. Fisher and P. Yates, Slatialical Tables for Biological, AgricuUural, and Medical 

Research, 2d ed., Oliver and Boyd Ltd., 1943. 

(7] P. L. Hbu, "On the asymptotic distributions of certain atatistica used in testing the 

independence between successive observations from a normal population,” Annols 
of Math. Slal., Vol 17 (1946), pp 350-364. 

18) T, Koopmans, "Serial correlation and quadratic forms in normal variables,” Annals of 
Math. Slat., Vol 13 (1942), pp. 14-23. 

[9] E. L. Lehmann, "On optimum teats of composite hypothesea with one constraint,” 
Annals of Math. Slat., Vol. 18 (1947), p 481 

[10] R. B. Leipnik, "Distribution of the serial correlation coefficient in a circularly cor¬ 

related universe,” Annals of Math. Slal , Vol. 18 (1947), pp. 80-87. 

[11] W. G. Madow, "Contributions to the theory of multivariate statistical analysis,” 

Trans. Am. Math Soc , Vol. 44 (1938), p. 461. 

[12] W. G. Madow, "Note on the distribution of the serial correlation coefficient,” Annals 

of Math. Slat., Vol. 16 (1946), pp. 308-310 

[13] M. Mebbinoton and C M. Thompson, "Tables of percentage points of the inverted 

Beta (F) distribution,” Biomelnka, Vol 13 (1943), pp. 73-88. 

[14] K. Pearson, Tables of the Incomplete Bela-Funclion, Cambridge Univ. Press, 1034. 

[16] G. W. Snedecor, Staliaiical Methods, 4th ed., Iowa State College Press, 1946, pp. 

222-225. 

[16] C. M. Thompson, "Tables of percentage points of the incomplete beta-function,” 

Biometnka, Vol. 32 (1941), pp. 161-181. 

[17] United States Department of Agriculture, AgricuUural StalisHcs, United States 

Government Printing Office, Washington, D. C , 1939, p. 390. 



BAYES SOLUTIONS OF SEQUENTIAI. DECISION PROBLEMS 
By A. Wald and J. WoLyowro 
Columbia UniversUy 

Summary. The study of sequential decision functions was initiatcHl by one of 
the authors in [1]. Making use of the ideas of this theory the authotH sue wded in 
[4] in proving the optimum character of the sequential probubilily ratirs U^st. 
In the present paper the authora continue the study of sequential derision func¬ 
tions, as follows: 

a) The proof of the optimum character of the sequential probability ratio 
test was based on a certain property of Bayes solutions for aetiuential d(M-ihions 
between two alternatives, the cost function being linear. This fundamental 
property, the convexity of certain important sets of a priori distributions, is 
proved in Theorem 3.9 in considerable generality. The number of ptissildc cleei- 
sions may be infinite. 

b) Theorem 3.10 and section 4 discuss tangents and bountlaiTi>' prtintH of Ihiw 
sets of a priori distributions. 

(These results for finitely many alternatives were announewi by one of us 
m an invited address at the Berkeley meeting of the Institute of Mathematical 
Statistics in June, 1948)‘ 

o) Theorem 3.6 is an existence theorem for Bayes solutions, Theorem 3.7 
gives a necessary and sufficient condition for a Bayes solution. These theorems 
generalize and follow the ideas of Lemma 1 of [4] 

d) Theorems 3.8 and 3.8.1 are continuity theorems for the average risk func¬ 
tion. They generalize Lemma 3 in [4] 

e) Other theorems give recursion formulas and inequalities which govern 
Bayes solutions. 

1. Introduction. In a previous publication of one of the authors [1] the decision 
problem was formulated as follows: LetX = ja:,) (i «= 1, 2, - * • , ad inf.) be 
a sequence of chance variables. An observation on X is given by a sequence 
X = {a;,] (i = 1, 2, ■ • • , ad inf.) of real values, where x, denotes the ohMOTed 
value of X,. A sequence x is also called a sample or sample point, and the totality 
M of all possible sample points x is called the sample space. Let Q(x) denote the 
probability that X{ < Xi for t = 1, 2, • • • , ad inf.; i.e., 0 is the cumulative dis¬ 
tribution function of X. In a statistical decision problem 0 is assumed to be un* 
known. It is merely known that 0 is an element of a given class of distribution 
functions. There is given, furthermore, a space D* whose elements d represent 
the possible decisio ns that can be made in the problem under consideration. 

• '’"u* statement of some of the results of the piesont paper is to be found 


82 



BATES soLxrnONa 


83 


The pioblem is to construct a function d = D{x), called the decision function, 
which associates with each sample point x an element d of D* so that the decision 
d = D{x) is made when x la observed. 

Occasionally we shall use the symbol D to denote a decision function Dix). 
This will be done especially when we want to emphasize that we mean the whole 
decision function and not merely a particular value of it corresponding to some 
particular x. 

If d = D{x) is the decision function adopted and if a:" = (®f} (f = 1, 2, • • *) 
is the particular sample point observed, the number of components of aj® we have 
to observe in order to reach a decision is equal to the smallest positive integer 
n = n{x^) with the property that D(x) = Dix^) for any x for which xi = xi , ■ >' , 
x„ = x°Tt . If no finite n exists with the above property, we put n(x) = oo. If 
d(x) is equal to a constant d, we put n(a;) = 0. We shall call 71 ( 2 :) the number 
of observations required by D when x is the observed sample. Of course, n(x) 
depends also on the decision rule Z> adopted. To put this in evidence, we shall 
occasionally write n(x,D) instead of n(a:). If A is a decision function such that 
n{x, Do) has a constant value over the whole sample space M, we have the classical 
non-sequential case. If n(x,Do) is not constant, we shall say that A is a sequential 
decision function. 

In the remainder of this section we shall sketch briefly some of the fundamental 
notions of the theory without regard to regularity conditions. The latter will be 
discussed in the next section. 

In [I] a weight function WCCr, d) was introduced which expresses theloss suffered 
by the statistician when 0 is the true distribution of X and the decision d is 
made. Let c(n) denote the cost of making n observations; i.e., c(rt) is the cost 
of observing the values of ■ ,X„. Then, if the decision function d S=e D{x) 

is adopted and G is the true distribution of X, the expected value of the loss due 
to possible erroneous decisions plus the expected coat of experimentation is 
given by 

(1.1) riG,D)= f W{G,D{x)]dGix)+ f c[n{x,D)]dO{x). 

Ju Jm 

The above expression is called the risk when D is the decision function adopted 
and G is the true distribution. 

Let £ bo an a priori probability distribution on fi; i.e., { is a probability measure 
defined over a suitably chosen Borel field® of subsets of Then the expected value 
of r{G, D) is given by 

(1-2) r(,^,D) « f r{G,D)d^. 

Jo 


* A Borel field is an aggregate of seta such that a) the null set is a member of the field, 
b) the complement with respect to the entire space (here M) is a member of the field, c) 
the sum of denumerably many members of the field is itself in the field. 



84 


A, WALD AND J. WOLFOWITZ 


The above expression is called the risk when f is the a pnoti distribution on n 
and D is the decision function adopted. 

We shall say that the decision function Do is a Bayes solution relative to the 
a priori distribution f if 

(1.3) Do) g r(e, D) for all D. 

If there existed an, a priori distribution on ft and if this distribution w'ere knovm, 
we could put 5 equal to this a priori distribution and a Bayes solution relative to 
£ would provide a very satisfactory solution of tlie decision problem. In most 
applications, however, not even the existence of an a priori distriinition can be 
postulated. Nevertheless, the study of Bayes solutions eorresponditiK to various 
a priori distributions is of great interest in view of some i-psults given in [1]. 
It was shown in [1] that under rather general conditions the class (1 of the Bayes 
solutions corresponding to all possible a priori distrihutioius £ baa tlie following 
property: If Di is a decision function that is not an element of C, tliere exists 
a decision function Dj in C such that 

(1'4) r((?. A) ^ r(0, DO for all G 

and 


(^■^) D 2 ) < r{G, Di) for at least one G. 

It was furthermore shown in [I] that under general conditions a mininmx 
solution Do of the decision problem is also a Bayes solution eomjsponding to 
some a priori distribution £. By a minimax solution we mean a decisitm furu'tion 
Do such that, for all D 


( 1 . 6 ) 


Sup /((?, Do) g Sup riG, D). 
a o 


2. Regularity conditions and other assumptions. We shall make the following 
assumptions: 

Assumption 1. The chance variables are identimlly andindepmlmtly dislributed. 
I he common distribution is either discrete or absolutely continuous. 

friK I w elementary probability law of X. when F is the dis- 

tnbution of Z.; 1 e. when F is discrete, p(a j F) is the probability that X. .. o. 
and when F is absolutely continuous, p(a | F) is the probability density of X,- 

alltt^oS? J If* ? smallest Borel field which contains 

all sets of points j; which are defined by the relations 




""I —I 

rrLT'T.'’v+ ” ■ r Mu«s « probability 

the totalrty of the* probability maaaurea tan, Jt 

* An F 01 p* is admissible if p* is in a. 



BAYES SOLUTIONS 


85 


be a given Borel field of subsets of SI. The only subsets of $2 which wo shall dis¬ 
cuss in this paper will be members* of II*, and all probability measures on U 
which ive shall discuss will be measurable (H*), This will henceforth be assumed 
without further repetition. 

Let A* be any set in H*, and A the set of F which corresponds to the F* in 
A*. The sets A form a Borcl field, say II. By definition, the probability measure 
of a set A according to a probability measure $(//*) on ft is to be the same as tho 
probability measure of rl* according to f. 

Let fljf X ft be the Cartesian product of M and ft ([5], page 82), and K be the 
smallest Borel field of subsets of Jtf X ft which contains the Carte.sian product 
of any member of B by any member of H*. 

For a given decision function d = D(x), W(F, D(x)) is a function of F and x. 
Hereafter in this paper we shall limit ourselves to functions Dix) such that 
W(F, D(x)) is measurable (K), and n(x, D) is measurable (B). 

It is true that in Section 1, W was given as a function of G, tho distribution of 
X. Because of Assumption 1,6^ = F*, and there is a one-to-one correspondence 
between F and F*. Thus we may, in appropriate places, interchange them freely. 

Assumption 2. For every real a, except possibly on a Borel set* whose probabil¬ 
ity is zero according to every admissible F, p{a j F) exists and is a function of a and 
F which IS measurable (if). If the admissible distributions F are discrete, thcreexisis a 
fixed sequence {b,} (f = 1, 2, • • • , adinf.) of real values such that J^T-i pib, 1 F) ~ 
1 for all admissible F. 

Assumption 3. TT(F, d) is bounded. For every d in D*, W(Z'’, d) is a function 
of F which is measurable (H). 

In what follows ^ will always denote a probability measure {!!*) on ft. Thus 

TT(f, d) = f W(F, d) de 
Jn 

exists. 

Assumption 4. The function c{n) = cn. Without loss of generality we may 
take 0 = 1, so that c(n) = n. 

We shall introduce the following convergence definition in the space V*: 
the sequence (d,-) converges to do if 

lim W(F, d.) = W(F, do) 
uniformly in the admissible F’a. 

Assumption 6. The space D* is compact in the sense of the above convergmett 
definition. 

One can easily verify that, if lira di =■ do, then 

*afl 

lira di) = Tf(t, do); 

I ^00 


* A Borel set is a member of the smallest Borel field which contains all the open sets of 
the real line. 



36 


A. yfkW AKD 3. WOIiFOWITZ 


i.e., F(f, d) is a coatinuous function of d. Thus, becavisc of Asf<unsption 5, the 
minimum of lf(f, d) with respect to d exists. 

We shall now show that, under the above conditions 

(2.1) f W[P*, 7)(a:)] dF\x) 

exists and is a function of F* measurable (//*). For any j li't 11/ be the wt in B 
such that «(a, D) = j. Then it is enough to show that, for any 

(2.2) f W[F*,Dix)]dF*[x] 

exists and is a function of F* measurable {H*). 

In the discrete case, the integral (2.2) is equal to the sum* 

(2.3) E W, D{x)]p(x, IF)... pixi 1 F). 

For fixed values of , the expression under the sumnnitiou sign is 

obviously a function of F* measurable (H*). Since, because of jbHsumption 2, 
there ate only countably many points (aii, • ■ • , ary) in F/, the sum (2.:i) must 
be a function of F* measurable (H*). 

In the absolutely continuous case, the integral (2.2) is ('qual to (2,4) 

(24) /,^mFM)(a:)] jftp(*,|F)rfvO') 

where p(j) is Borel measure in the j-dimensional Euclidean space. TIuj iiiti-grand 
is me^rable (K). Hence, the integral (2.4) exists and is a function of F* metwur- 
able («*) (see [5], Chapter III, Theorems 9.3 and 9,8). 


3. Some results concerning Bayes solutions. If { is the a priori prubaliility 
measure on a, the a posteriori probability of a subset u of U for given vatuea 
®i I of the first m chance variables is given by 


( 31 ) 


Let 


U, a:i, • ■ ‘, a;,n) 


£ p(xi IF) ... p{x„, I F) di 
£ P(xi I F) ■. • p{x„ I F) de 


Po({) “ Min TF(f, d). 

Snea^ton'n®. K?. D) with 

respect to D where D m restricted to decision functions for which nfx D) £ m 

Integer m, let d ^ D»(.) denote a dcd£ Zoin 
it (23) and (2.4). proceed as 



DAVES SOLUTIONS 


87 


D for which n(x, D) m for all x. Thus, wc can write 

(3.3) P„(f) = Inf r(f, m (m = 1, 2, ■ ■ ■ , ad inf.), 

DM 

Let 

(3.4) p(i) = Inf r (t, D). 

D 

We shall first prove several theorems concerning the functions puf.?), Pm(|), 
and p(^). ^ 

Theorem 3.1. The following recursion formula holds. 

(3.5) Pm+i(f) = Min ^po(f),l + Pm({o) pia [ f) da 

(m = 0, 1, 2, • • • , ad inf.) 

where 

(3.6) ?a(w) = t(« 1 a) and p(a U) “ f p(« I df. 

JO 

Proof: Let p* (i) (w = 1, 2, • • • , ad inf.) denote the infimum of r(^, Z)) with 
respect to D where D is subject to the restriction that n(x, D) S 1 and g m for 
all X . Clearly, 

(3.7) P.+i(«) = Mintpo(f), P*+i(f)]. 

Let I o) denote the infimum with respect to D of the conditional risk (con¬ 
ditional expected value of W{F, jD(a;)] -f- n{x, D)) when the first observation 
xi on Xi is a and D is restricted to decision functions for which n{x, D) ^ 1 and 
g m for all x. Let i5(m) be the temporary generic designation of such a decision 
function. Let JD(m ] a) be the decision function which is obtained from jD(m) 
when the first observation is a. Finally let r(f, D | o) be the conditional risk when 
the a priori distribution function is D is the decision function and requires at 
least one observation, and the first observation is o. We then have that 

r(f, 5(m -f 1) I a) = r(fa, D{m + 1 | o)) -f- 1. 

Hence 

(3.8) Pm+l(5 1 a) "= pm(W + 1. 

The unconditional quantity must clearly be equal to the average value 
of the infimum of the conditional risk. Thus we have 

(3.9) pt+iiO = f P«+i(f I a)p(a ] ^)da. 


‘ If the distribution of X is discrete, the integration with respect to o is to be replaced 
by summation with respect to a. This remark refers also to subsequent formulas. 



88 


A. miiD AND J. WODFOWITZ 


Equation (3.5) follows from (3.7), (3.8) and (3.9). 

Theorem 3.2. Thejundion p(i) satisfies the following equaiwn: 


(3.10) p(t) = Min p(,(?), f p(jf„)p(a U) do + 


The proof of this theorem is omitted, since it is essentially the same as that of 
Theorem 3.1. 

Theohem 3.3.^ The following inequalities luM: 

(3'11) 0 % p„(f) — p({) g — (rtj, = 1, 2, * • ‘ , a<l inf.) 

Tfit 

where Wn is the least upper hound of W{F, d). 

Proof; Let {D,| {i = 1, 2, • • • , ad inf.) be a sequence of decision functions 
such that 


(3.12) limr(5, A) = p(f). 

l«ieo 

Let, furthermore, P,(^) denote the probability that at least m observations will 
be made when E. is the decision function adopted and ? is the a priori probability 
measure on fl, Since p(5) ^ Tfo and since 

(3.13) r(f, Di) g mP,(0, 
it follows from (3.12) that 


(3.H) lim sup P,(5) g ™ . 

f-w m 

Let Er be the decision function obtained from E, as follows; D7{jc) D>{z) 

for all X for which n{x, Di) S m. E7(a>) is equal to a fixed element do fur all x for 
which Ti(a:, E.) > m. 

Clearly, 

Kf, D7) ^ r(f, E.) + P.({)irQ. 

From (3.12), (3.14) and (3.15) it follows that 


(3.16) 


lim sup r{i, ET) g p{t) + . 

m 


member of (3.10), the second half of 
(3.11) follows from (3.16). The first half of (3.11) is obvious. 

’ This theorem is essentially the same as Lemma 2.1 in 101 

(.F, ^sS'^tC^Z^ Consider the set V of couples 

V V V *k. where c le some real constant, Wo want to show that 

J1 Let V r'T" 'T r* T *) that W (fTL < c Than 

VoV. + y* X 7., so that v Ik 1^* • ^ hy Assumption 3. Finally we have Y » 



BA.YBS SOBTJTIONS 


89 


The immediate eon&equencc of Theorem 3 3 is the relation*' 

(3.17) Hm p„(?) = p(f). 

w—oa 

Theorem 3.4. If fi and arc two probalnlily -measures on Q such thaC^ 

(3.18) g 1 + e for all w, 
then 

(3.19) p(£i) S (1 + «)p(ti). 

Pnoop: It follows from (3.18) that 

(3.20) r(fi, £)) g (1 + €)r(^,, D) for all D. 

Hence, (3.19) must hold. 

The above theorem permits the computation of a simple and in many cases 
useful lower bound of / p(^a)p(,a \ ^) da as follows: 

JL.M 

For any real Amlue a, let be a non-negative value (not necessarily finite) de¬ 
termined such that 

(3 21) S ^ l-b *0 for all a.. 

fnfw) 

Then 


(3.22) r p(Sj p(a U) da ^ f “ p(o | 0 da =p(f) f” da. 

J~co J—« 1 "f- €o J—00 1 Cq 


Since fa S 0 and since po(i) S p($)i we obviously have 


(3.23) 

P(f) da S p(f) -|^1 - _ 


Hence, 

we obtain the inequality 


(3.24) 

/I to r 

/ p(td) P(a 1$) da ^ p(() - po(() 

J—qp L 

'i- ff'i'*'*.] 

J—fio 1 *4“ £a J 


An upper bound of the left hand member in (3.24) is obtained by replacing 
p by po; i.e., 


(3.25) 


f p(fa)p(o I £) da g f po(U)p(a I £) da. 

JL^oO *^00 


• A proof of (3.17) is contained implicitly in the work of Arrow, Blackwell and Girshiok 
([2], Section 1.3), 

“ The left member of (3.18) ie defined to be equal to 1 when fi(u) ti(M) =• 0. 



90 


A. WALD AND J. WOLTOWm 


The bounda given in (3.24) and (3.25) may be inseful in ('(Jiiatruoting Bayess 
solutions, since the following theorem holds: 

Theorem 3.5. If 

(3.26) Po(S) > f Po(?a)p(a 1 ?) da + 1, 

d—eo 

thenpQ) < po(i). If 
(3 27) 

then p(^) = po(^). 

The above theorem is an immediate consequence of (3.10), (3.24) and (.3.2.5), 
A decision procedure relative to a given a priori probability measure {o will 
be given with the help of the function p(f) oa follows: If p(io) — Po(6fl), take a 
final decision d for which PPffo, d) is minimized. If p(to) < Po(fo)) take an obsenm- 
tion on Xi and compute the a posteriori probability measure . If p({i) = 
po(ii), stop experimentation with a final decision d for which 17(^1, d) is minim¬ 
ized If p(fi) < po(^i), take an observation on Aj and compute the a posteriori 
probability measure corresponding to the observed values of and A'.j, and 
so on. The above decision procedure will be shown later to be a Bayes solvition. 
Theorem 3 5 permits one to decide whether p(f) < po(f) or =» po(f) wlmnt'Vt'r 
( satisfies (3.26) or (3.27). Theorem 3.5 will be useful when the clans of all f’s 
for which neither (3.26) nor (3.27) holds is small. 

For the purposes of the next theorem let t> designate the decision proeedure 
described in the preceding paragraph. (We shall shortly show that 7) is a tieciHion 
function in the sense of our definition.) 

Let jO be the decision procedure where the first observation is taken and then 
one proceeds according to t). 

We shall now prove that t) and jD“ are Bayes solutions. More precisely, we 
shall prove the following theorem;” 

Theorem 3.6. For any t> and as defined above arc decision functions. 
Lei D be any decision function for which nix, D) ^ 1 and let 

P*(«) = Inf r({, D). 

D 

Then 


v(i, 3) = p({) 

and 


= p*({). 


existecoe theorems (l6], 



BAYES SOLUTIONS 


91 


In view of this theorem, the operation “infiraum with respect to D” in the 
definitions of p(f), and p*(^) can be replaced by “minimum with respect to i).” 

First we shall establish the measurability properties of £> and Since the 
proofs are similar, we restrict ourselves to consideration of t). Let fn,...,** be 
the a posteriori distribution (3.1). From the {B) measurability of 
and it follows easily that nix, D) is measurable (B). It remains to 

provethatTF(F, .^(a:)) IS measurable (if). For this purpose, let L‘ = (dl , • • • ,di,) 


be a sequence -r dense in 25*, i.e., for any d t D* there exists ng e D* such that 


g t L' and | WiF, d) — IF(F, ff) 1 < i uniformly m F. (The existence of such 

t 

a sequence follows from Assumption 5.) Let now 25,(x) be a decision function 
defined as follows: 


n(x, 25.) = nix, £>). 

Suppose Mx, ti) = m when the observations are sii, ■ • • ,Xm. We define 25,(a;) 
to be such that D,ix) is an element of V and 


(3.28) WiU . D,ix)) = Min W({*„ , d), 

nx,* 


i.e., D,ix) takes the mipimizing value of d. For any fixed d, the sot of x's satisfying 
the equation 25,(a;) = d is without difficulty shown to be iB) measurable. Since 
D,ix) assumes only a finite number of values in 25*, it follows from Assumption 3 
that WiF, 25,(a:)) is measurable (if). Now 

lim TF(F, 25,(3:)) = WiF, t>ix)), 

{•-too 

so that WiF, I)ix)) is measurable (if). 

We shall now prove that .D is a Bayes solution, i.e., that 

(3.29) p(|) = r(^,5). 

In a similar way it can be proved that 

(3.30) p*ii) = r(f, t>\ 

If po(£) = p(^), there can be no better decision function (from the point of 
view of reducing the risk) than t), i.e., .S is a Bayes solution. Suppose then that 

(3.31) Po({) > p(^). 

If (3.31) holds and t> is not a Bayes solution, there exists a decision'function 
5i such that 

(3.32) r(i, 250 < r({, 25) 
and 


rii, A) < 


po(i) + p(f) 


(3.33) 


2 



92 


A. WALD AND J. WOLFOWITZ 


Now Di must require that at least one observation be taken, else (3.3iJ) eoultl 
not hold. Thus D and Di both require at least one observation 

Suppose one observation is taken. Let r{^, D \ a) denote_ the conditiomd risk 
of proceeding according to D when f is the a priori distribution and n is the 
first observation. For a given D we have that r(t, i) | a) is a function only of 

. In particular r(f, \ a) and r(f, Di \ a) are functions only of . 

We can now apply to r(g,J!) | a) and r(i, A | a) the .same argument (hat 
was applied above to r(f, 3) and r(5, A), and conclude again as follows: when¬ 
ever /)o(?a) = pttc) (when one takes no more observations according to J)), 
taking additional observations cannot diminish the conditional risk below 
r(£, 3 I a) (5i may require an additional observation without having 

r(J, 5i I a) > r(^, 3\a). 

This can happen when po(5a) = p*(Sa)). Whenever ptiCfo) > p(£o) (when f) re¬ 
quires us to take another observation) two cases may occur: either a) requires 
us to take another observation, in which case its decision is the. same as that of 
3, or b) Di requires us to stop taking observations. There exists then another 
decision function whose conditional risk is less than 

Pii(fa) -h p(fu) I , 

2 ■ 

Both this decision function and 3 require that another observation he taken. 
We conclude that up to and including the first observation, 3 coineides either 
with 3i or with another decision function 3^ whose risk is not griuiter than that 
of A. 

We continue in this manner for 2, 3, • • • obseiwations. The above argument is 
always valid because of Assumption 4 and because the past history of the process 
(the sequence of observations) enters only through the a posteriori probability. 
Thus we conclude that for any positive integer h there exists a decision function 
A such that up to and including the A-th observation 3 gives the same decision 
as A and the risk corresponding to A does not exceed the risk corresponding 
to A • Since limi,-,« r(f, A) S r({, jD), (3.32) cannot hold. Hence (3.29) holds and 
.6 is a Bayes solution. 

For any probability measure ^ on one of the following three conditions 
must hold: 

(1) Mind W(£, d) < r(£, 3) for any D for which n(a:, H) L 

(2) Mind TF(f, d) g r(f, D) for all D for which n(a:, 3) I, and tliftCiiimlity 

sign holds for at least one D with ri(a:, jD) ^ 1. 

(3) There exists a D with n{x, H) ^ 1 such that Mind IF((, d) > r((, IJ). 

In view of Theorem 3.6, the conditions (1), (2) and (3) cun be expreasi'd by: 
(1) po(£) < p*(?), (2) po(i) = p*(f) and (3) pi)(t) > respectively. 

We shall say that a probability measure f on fl is of the first type if it satisfies 
(1), of the second type if it satisfies (2), and of the third type if it satisfies (3), 
Since the a posteriori probability defined in (3.1) is also a probability measure 



BAYES SOLUTIONS 


93 


on any a posteriori probability measure will be one of the three types men¬ 
tioned above. 

We shall now prove the following characterization theorem: 

I'heorem 3.7.^^ A necessary and sufficient condition for a decision function 
d = Doix) to he a Bayes solution relative to a given a prion distribution fo is that 
the follomng three relations be fulfilled for any sample point x, except perhaps 
on a set whose probability measure is zero when fo is the a priori distribution in U: 

(a) For any m < n(x, Do), the a posteriori distribution f(w \ , Xi , ■ ■ ■ , x„) is 

either of the second or of the third type, 

(b) For m = n{x. Do), the a posteriori distribution f(« | fo, aii, • ■ ■ , x„) is either 
of the first or the second type, 

(c) For m = n{x. Do), we have 

Min , d) = Wiis, .. Doix)) 

i 

where . . stands for an a priori distnbution that is equal to the a posteriori 

distribution corresponding to ^o ,xi , • • - ,Xm, 

Proof: We shall omit the proof of the sufficiency of the conditions (a), (b) 
and (c), since it is essentially the same as that of Theorem 3.6. To prove the 
necessity of these conditions, let d = Doix) be a decision function and let M* 
denote the set of all sample points x for which at least one of the relations (a), 
(b) and (c) is violated, First, we shall show tht M* is a set measurable (B). 
Let Mi be the set of all ai’s for which (a) is violated, M* the set of all x's for 
which (b) is violated, and M* the set of all a;’s for which (c) is violated. Clearly. 
M* IS shown to be measurable (B) if we can show that Miii = 1, 2, 3) is meas¬ 
urable (B). Let Af* (r = 1,2, ■ ■ • , ad inf) denote the subset of M* for which 
the first violation of the corresponding condition occurs for the sample Xi, • ,x,. 
We merely have to show that M*, is measurable (B) for all i and r. The meas¬ 
urability of Mtr follows from the fact that Mind W (?*„., ,i, , d) and 

W[^x, .,,, Doix)] 

are functions of x measurable (B). To show the measurability of Mt, and JIf?, , it 
is sufficient to show that the set of all samples xi, ■ • , Xr for which f*,, is 
of type i(i = 1, 2, 3) is measurable (B). But this follows from the fact that 

. X,) and p*(£*i,.. ,ir) are functions of ixi, ■ , xf) measurable (B). Hence, 

Af* is proved to be measurable (B). 

For any x in M* let mix) be the smallest positive integer such that at least 
one of the relations (a), (b) and (c) is violated for the finite sample 


> 3^2 > ■ * * ) ■ 

Clearly, if a: is a point in M*, then also any sample point y is in M* for which 
Vi = a^i, ' • • , l/m(i) = Xm<.x) . Let x® be any particular sample point in M* and 
let r(£o, A , a:®, • • • , denote the conditional risk when £o is the a priori 

“ See also the proof of Lemma 1 in [4]. 



94 


A. -WALD AND J. WOLFOWITZ 


distribution in A is the decision function adoptfid and the first m{x’’) oltsorvu- 
tions are equal to a:?, • ■ ■ i » respectively; i.e., r(fo j , ■ ■ • , Xw(iO)) 

is the conditional expected value of TF(i^, Dci(z)) + it(x, A)> when ^9 is^the 
a priori distribution in (1, Do is the decision function adopted and Xa , * ■ * f 
are the first m(/) observations. 

Let Di(x) be the decision function determined as follows; for un>' x nut in 
M* we put Di(x) = Do(x). For any x in M*, let n(xi, A) he etiual to the .smallest 
integer n(.'c) ^ m(a?) for which 

Pii(Ai' ~ .**(»)) 

and the value of Di(x) is determined so that condition (e) tif our theon'm is 
fulfilled. Since, for any positive integer m, the subset of M* where mfrl « m 
IS (B) measurable, Di{x) has the proper measurability propertie.^?. Applying 
Theorem 3.6, we see that 

(3.34) t{^o , A ) 1 *' * ) p(f*i. 

for any x in M*. On the other hand, since A violates at least one of tht* condi¬ 
tions (a), (b), and (c) at every point x in M*, we have 

(3.35) ^(fojA,®!) *'■ j ®in(»)) ^ 

for every x in M*, If the probability measure of M* is positive \vlu»n is the 
a priori probability measure, the above two relations imply that 

KicA) >r(fo,A). 

Thus, A is not a Bayes solution and the proof of Theorem 3.7 is complete. 
We shall now prove the following continuity theorem.^* 

Theobem 3.8. Let {{<) (i = 0 , 1, 2 , • - • , od inf.) ht a sequence of probalnlUy 
measures on ii such that 

(3.36) lim = 1 uniformly in «, 

«“« to(uj 

Then 

(^■37) lim p(£<) = p(to). 

i-oe 

Proof: It follows from (3.36) that for any e > 0, we have for almost all 
values i 

(3.38) I® < 1+ . and|W all 

Our theorem is an immediate consequence of (3.38) and Theorem 3 . 4 , 





BAYES SOLUTIONS 


95 


A stronger continuity theorem is the following: 

Theorem 3.8.1. Let (f,), (z = 0, 1, 2, , ad inf) be a sequence of prohabitity 

measures on Q, such that 


lim f,(<o) — 

uniformly in w. Then (3.37) holds. 

Proof: It follows from (3.11) that 

lim p„(f) = p(f) 

uniformly in f. Hence it is .sufficient to prove that, under the conditions of the 
theorem, 

lim p„(f,) = p„(&,) 

i^oa 

for any m. Let D” (z) denote a decision function for which n (z, D”") g m for 
all z. It follows that, for a fixed m, r{F, D"*) is bounded, uniformly in F and D" 
(Assumptions 3 and 4). From the hypothesis on {f,} it then follows that 

Iimr(£.,D™) =r({o,D’") 

uniformly in D"*. From this the desired result follows readily. 

A class C of probability measures £ on fl will be said to be convex if for any 
two elements and fa of C and for any positive value \ < 1, the probability 
measure { = Xfi + (1 — X) Ja is an element of C. 

For any element do of D, let Cx.a^ denote the class of all probability measures 
5 of type i {i — 1, 2, 3) for which W{k, do) = Min W(^, d). Let Cd denote the 

d 

set-theoretical sum of Ci.i and (7a,d . We shall now prove the following theorem. 
Theorem 3.9. For any element d, the classes Ci,d and Cd are convez}*' 

Let fi and $a be two elements of Ci.d. Then for any decision function D{z) 
which requires at least one observation we have 

(3 39) W(Si, d) < r(fi, D) and W(fa, d) < r {^^, d). 

Let £ = Xfi + (1 — X) fa where X is a positive number <1, Clearly, 

(3.40) If (f, d) = XW ($1 , d) -L (1 - X) If (fa, d) 
and 

(3.41) r(i, D) = Xr(^i, Z)) + (1 - X) r(fa, D). 

From (3,39), (3.40) and (3.41) we obtain 

(3.42) If(e, d) < r($, D) and lf(e, d) = Min If(f, d*). 

d* 

Hence J is an element of Ci.d and the convexity of C-y.d is proved. The convexity 
of Cd can be proved in the same way by replacing < by g in (3.39) and (3.42), 


“ See also Lemma 3 in [4]. 



96 


A. WALD AND J. WOLFOWITZ 


We shall say that a set L of probability measures f is a linear manifolrl if 
for any two elements and of i, | = afi +(!-«) is also an cdement of 
L for any real value a for which afi + (1 - a) {s is a probability meamirf'. A 
linear manifold L will be said to be tangent to Cd if the interstn-tinri of L aiifl 
Ci.i is not empty, but the intersection of L and Cu is I’mpty 

For any decision function D(x) and for any element d of D*, let L(D, d) 
denote the linear manifold consisting of all £ which satisfy the eciuation 

(3.43) Tr({, d) = r(i, D). 

Theorem 3.10. Let Jo he an element of Cj.d and let /)o(x) be a dpcimm fmrhon 
that requires at least one obsermtion and is such that IP'ffo, d) ■= r(^t ,, Thtm 
the linear manifold L(Do, d) is tangent to Cd . 

Proof; Jo is obviously an element of L(Do, d). Thu.s the interKeetion of /,(/)<., d) 
and Ci,d is not empty. For any element Ji of Ci,i we have TFffi , d) < rCji 
for any D that requires at least one observation. Hence, , d) < r(J| D ) 
and, therefore, Ji cannot be an element of L(i)o, d). This proves’onr thcomm. ' 


4 . Applications to the case where fl andi)* are finite. In this section wc slmll 
apply the general results of the preceding section to the following special rone • the 
space fi consists of a finite number of elements, F,, • •. . F, Uy), and the «],ace 
D consists of the elements , • • • , d* where di denotes tlie dcciMon to accept 
the hypothesis that Fi is the true distribution. I^et 


1F(F,', dj) - W {j = 0 for i « f and > 0 for t j. 


(4.1) 

i > 3 will be obvious. We shall first consider the case 1: = 2. In this caw. anv 
a prior, distribution J is represented by two numbers g, and p, when* g u, Z 

Let J, denote the a priori distribution corresponding to p, - 1 (f « i. ^ Clearly 

3 9‘ Tdl" *C l„”d c'’' ?■ Th"*™™ 

We ’ ■ ^"rtterraore, w, obviouely 

(4 2) 
and 


9iWn g piWjj for all J in Cd, 


^ ffiWij for all { in Cj,. 

Let Jo = (pi ^ gf) be the a priori distribution for which 

?2W^21 = p?F„ . 

positive number. C and 

(4.5) 


0 < c' S p c" < 1 



BAYES SOIiUTIONS 


97 


and such that the class Cdi consists of all f for which ^ c', and the class Cd, 
consists of all f for which S c". 

Thus, the following decision procedure will be a Bayes solution relative to 
the a priori distribution £ = ((/i, gi) :Ifg 2 ^ c' or S c", do not take any observations 
and make the corresponding final decision. If c' <. gz <. c", continue taking observa¬ 
tions until the a posteriori probability of Ht is either ^ c" or g c'. If this a 
posteriori probability is ^ c", accept Hi, and if it is g c', accept Hi . 

The a posteriori probability of Hi after the first m observations have been 
made is given by 

^ _ gipjxilFi) pjx^lFi) _ 

I I F^) + I ^^) ... I F,) • 

If c' < gi < c" and if the probability (under Fi as well as under Ff) is zero that 
gim = c' or = c" for some m, then it follows from Theorem 3.8 that the above 
described Bayes solution is essentially unique; i e., any other Bayes solution 
can differ from the one given above only on a set whose probability measure 
is zero under both Fi and Fi . 

Provided that at least one observation is made, one can easily verify that the 
above described Bayes solution is identical with a sequential probability ratio 
tost for testing Hi against Hi. The sequential probability ratio test is defined 
as follows (see [3]): Two positive constants A and B {B < A) are chosen. Ex¬ 
perimentation is continued as long as the probability ratio 

^ _ p{xi [Fi) • • • pjxm I Fi) 

Plm p{xi 1 /^i) • • • p{x,n I Fi) 

satisfies the inequality B < ~ < A. If — ^ A, accept Hi. If — g B, accept 

Plm Plm Plm 

Hi. The Bayes solution described above coincides with this probability ratio 
test for properly chosen values of the constants A and B 

The results described above for fc = 2 are essentially the same as those con¬ 
tained in Lemmas 1 and 2 of an earlier publication [4] of the authors 

We shall now discuss the case k = 3. Any a prion distribution £ can be repre¬ 
sented by a point with the barycentric coordinates gi , gi and g, , where g, is 
the a priori probability of Zf.(z = 1, 2, 3). The totality of all possible a prion 
distributions £ will fill out the triangle T with the vertices 0i, On and Oa where 
0, represents the a priori distribution corresponding to g, = 1 (see Figure 1). 

Clearly, the vertex O; is contained in Cs, ■ Thus, because of Theorem 3.9, 
Cd,{i =1,2, 3) is a convex subset of T containing the vertex 0;, as indicated 
in Figure 1. 

If one of the components of £ = (gi, gi, gf) is zero, say g^ = 0, then H, can 
be disregarded and the problem of constructing Bayes solutions reduces to the 
previously considered case where k — 2, Thus, in particular, the determination 
of the boundary points Pi, Pa, ■ ■ ■ , Ps of , Cdj and Cj, which are on the 
boundary of the triangle T, reduces to the previously considered case, k = 2. 



98 


A. WALD AND J. WOLFOWm 


It follows from Theorems 3.8 and 3.9 that the iiitotwetion of C'j, with any 
straight line T. through 0. is a closed segment. One endpoint of this wgraent 
is, of course, 0,. Let B, denote the other endpoint. It follovvH fnim Theorem 
3.7 that B, must be a point of Ca.d,. Any interior point of 0,B, can Im shown 
to be an element of Ci,d,. The proof of this is very similar to that, of Theorem 
3.9 

We shall now show how tangents to the sets Cj, , Cd, and Cj, can lie con¬ 
structed at the boundary points Pi, Ft, • - • , Ft. Consider, for example, the 
boundary point Pi of Cj, that lies on the line 0i 0*. lAst £i be the a priori distribu¬ 
tion represented by the point Pi , Since the a priori probability of Ht is wro 
according to , we can disregard Hi in constructing Bayes solutions relative 
to fi. Let Diix) be a sequential probability ratio test for testing Hi against Hi 


0. 



Fia. 1 


which requires at least one observation and which is a Bayes solution relative 

to . Since is a boundary point, such a decision function Di exists. Thus, we 
have ' 


(4.8) 


) di) = r(?i, Hi) = Inf r(ti, H). 


denote the probability of accepting Hj when H( is true and Hi is the 

adopted. Let furthermore, m denote the expected number 

« aefontid TK procedure when P,. is true and Hi 

dopted Then, for any a prion distribution f g,) ^e have 

r(J, Hi) = 2 giWijocij -f- 2 giUi 

i 


and 

(4.10) 


^(S, di) = D giWii . 


BAYES SOLUTIONS 


99 


Thus, the linear manifold L(Di , di) is simply The straight Ime given by the 
equation 

(4.11) = 

i ti] i 

This straight line goes through Pi and, because of Theorem 3.10, it is tangent 
to Cdi. Tangents at the same points F?, * • *, Pa can be constructed in a similar 
way. 

The convexity properties of the sets C'd,(i = 1,2, • •, k) were established by 
the authors prior to the more general results described in Sections 2 and 3 and 
were stated by one of the authors in an address given at the Berkeley meeting 
of the Institute of Mathematical Statistics, June, 1948. More general results 
when fl and D* are finite, admitting also non-linear cost functions, were obtained 
later by Arrow, Blackwell and GirsMck [2], 

REFERENCES 

[1] A. Wald, “Foundations of a general theory of sequential decision functions,’' Bco- 

nomlnca, Vol. 15 (1W7), pp. 279-313, 

[2] K. J. Abbow, D. Blackwell, M. A, Gibbhick, “Bayes and minimax solutions of sequen¬ 

tial decision problems," Econometma, Vol. 17 (1949), pp, 213-244. 

[3] A. Wald, “Seguenttal mlj/iii,” John Wiley & Sons, New York, 1947. 

[4] A. Wald and J. Wolpowitz, "Optimum character of the sequential probability ratio 

test," Annals of Math, Siat., Vol. 19 (1948), pp. 326-339. 

[6] S, Saks, ‘'Theory of the integral," Hafner Publishing Company, New York. 

[6] A. Wald, “Statistical decision functions," Annals of Math, Stat, Vol. 20 (1949), pp 
166-206. 



ON THE DISTRIBUTIONS OF MIDRANGE AND SEMI-RANGE IN 
SAMPLES FROM A NORMAL POPULATION 

K. C. S. PiLLAI 

University of Travancore, Trivandrum. 

1. Summary. In this paper the simultaneous distribution of tnidnintji* and 
semi-range has been obtained and used to derive the dislributioriH of initlrange 
and semi-range in samples taken from a normal population. 

2. Introduction. The concept of ordering a sample ha,s given riM* to iimunier- 
able problems for statistical investigation. Several autlior.s have (’ontrihiited to 
the study of ordered individuals and, in particular, to the study of extreme' indi¬ 
viduals, their sum and difference in samples from a normal population. L. H. t', 
Tippett [1] has studied the first four moments of the range and lias tabled the 
mean-range for sample size ranging from two to thousand Htudent r*l) has 
determined the nature of the distribution of range for particular sfiinple sizes 
by purely empirical methods. T, Hojo [3] has compannl the standartl error of 
midrange to that of median and mean m normal Bamplen, E. S. Pearson and 
H. 0 Hartley [4] have tabled the values of the probability integral t>f range 
for sample size up to twenty. E. J. Gumbcl [5], [0], [7] has CRtahlished the imie- 
pendence of the extreme values in large samples from population of unlimited 
range and obtained the distributions of range and midrange, The uayiuptotic 
distribution of range has also been investigated by G. Elfving [8]. J, E. Daly [9) 
has devised a t-test adopting range in place of standard deviation in Rludent's t 
and in a modified t-test E Lord [10] has used range instead of standartl devia¬ 
tion. An extension to two populations of an analogue of Student’s t-test using the 
sample range has been worked out by John E. Walsh [11], S. S. Wilks [12] has 
given a complete and detailed account of the researches on onler statistics and 
also a number of suggestions regarding possibilities of utilising order statistics 
m statistical inference, In this paper the distribution of midrange has been 
developed as a series and a method of evaluating the probability integral for 
semi-range based on an infinite series expansion for the normal tirobahility 
integral has been suggested. 


3. Distributions of midrange and semi-range. Let 

®1 ^ 352 • ■ ■ < Xn 

be an ordered sample from a normal population witli zero mean and unit stand¬ 
ard deviation Then the joint distribution of an and a:„. the lowest and highest 
values respectively, is given by [13], ^ 


100 



DISTRIBUTION OF MIDRANGE 


101 


(1) 'p{xi,Xn) = [n(n - l)/27r]j^J e dt/-\/^ e 
Let 

M = ixi + Xn)/2 
and 

W = (Xn — Xi)/2. 

M is the midrange and W is the semi-range of the sample. From (1) the simul¬ 
taneous distnbution of M and W reduces to 

r "]n-2 

(2) p(M, W) = [n(7i - l)/7r]e''^"’+'^‘’ [d^/v^J . 

It has been shown [14] that if 

r -Af-t-Jf “ifc 

(3) FiM, W) = \ dt/V^ , 

(4) F{M, 17) = -I- + • • • + -{-•••], 

where coefficient is given by 

2iA?^ = - hVWi[At^^^W 

(5) 

-f 4fc“T7Vr(4) + • • 4-i4^”l7'‘"Vr(2i)]. 
Using expansion (4) equation (2) reduces to 

(6) p(ilf, 17) = [7i(n - E . 

It IS evident that the A^s involve terms of the form 

where s, q, m are positive integers and 

m) = r 

Jo 

Integrating (6) with respect to 17 

(7) p(M) = [n(n - DAle-""’'" E B.M^' 

i-O 

where 

(8) Bo = ■\/Trf2l(.n — 2, 0, 2), 

(9) Bi = [(n - 2)/2][V^ I(ji - 2, 0, 2) - 7(n - 3, 1, 3)], 

Bj = [(a - 2)/2“r(3)][\/^72 (n - 2)7(a - 2, 0, 2) 



102 


K. C. S. PII/IiAI 


(10) - {2n - 6)/(n - 3, 1, 3) - (l/3)7(« - 3, 3, 3) 

4" "s/S/t (ji — 3)7(71 —■ 4, 2, 4)] 

where 

(11) I(s, q,m) = [ [^(a;)]*a:''e"”**'* 

Jo 

Using the method of integration by parts, the evaluation of 7(«, q, m) can he 
reduced ultimately to that of 7(p, 0, r) and this function for different values 
of p and r is given in Table L 


TABLE I 

Values of Integrals I (p, 0, r)^ 


p 


1 

r 


2 

4 

a 

8 

1 

0.277,063,21 

0.147,583,62 

0.100,735,97 

0.076,490,19 

2 

0.152,980,4 

0.064,094,20 

0.037,255,93 

0.0-25,060,53 

3 

0.098,373' 

0.033,453,6 

0.016,808,71 


4 

0,069,10 

0.019,535,1 

0.008,589,67 


5 

0 051,44 

0,012,325,6 



6 

0 039,90 

0 008,223,9 



7 

0 031,94 




8 

0 026,17 





The first five B Coefficients for n ranging from 3 to 10 are tabled below. 

TABLE II 


Values of B Coefficients. 


So 



s, 


0 347,247,25 
0 191,732 
0.123,292 
0.086,60 

0 064,47 
0.050,01 
0,040,03 

0 032,80 

0 040,642,87 

0 058,751 

0.067,184 

0.070,93 

0.072,20 

0.072,09 

0.071,27 

0.069,97 

0.002,772,90 

0 010,906 

0.021,526 

0.033,23 

0.046,65 

0,057,22 

0.068,96 

0.080,31 

0.000,133,80 

0.001,460 

0.004,988 

0.011,20 

0.020,28 

0.032,21 

0.047,01 

0.064,66 

0.000,005,00 

0.000,153 

0.000,909 

0.002,97 

0.007,14 

0.014,69 

0.024,98 

0.040,51 


* The integrals have been evaluated by using (14). 



DI8TRIB0TI0N OP MIDRANGE 


103 


The accuracy obtained by keeping the first five terms in p{,M) may be judged 
from the following values of the total probability calculated for small values 
of n. 


TABLE III. 


Total probability keeping the first five terms in p(M) 


Size of sample 

3 

4 

5 

6 

7 

Total probability 







Integrating (6) with respect to M, p{W) may be obtained But p(W) 
volves integral 4>{W) and to evaluate the integral probability of W expansions 
for and its powers have to be developed. 

Since 4,{W) = Vlfir dt = - IfVfi +•■•), 

JO 

a convenient expansion is given by 

(12) r dt = (1 + ojTF* + • • • + o.lF'*' + ■ • •) 

JO 


where 0 { follows the recurrence relation 

(13) 3(2i + l)a. - a._i = (-l)y3’“‘r(t + 1), 

as may be seen by differentiating (12) with respect to W and equating the coeffi¬ 
cient of on both sides. Again 

(14) [<#.(TT)]' = 
where 

(15) iS = 1 + oaTF* -1- asTF" + • • • + a,W^' -{-••• 
and 

(16) = 1 + -f + • ■ ■ 

where 

\ 

(17) Ai'’ = D 'C.alapoj*' ■ • * a\'/8i\8i\ - ■ • s.! 


and 


(17a) 

Clearly Ui = 


Si -f- 282 + ■ • 4- is, — i, 

Si + S 2 + ■' • + s. = s. 

In evaluating the Ki^^’s summation with respect to s is first 










104 


K. C. S. PILIjAI 


performed, the values of Si, sa, being obtained w) ns to Hiitirtty the rela¬ 

tions (17a); and thereafter the values of the a’s are substituted. It maj' be iiotfsl 
that Oi = 0 The K coefficients for j up to 8 and t up to 18 are given fielow. 


TABLE IV 


Coefficients. 


t 

2 

i ^ 

4 j 5 

0 011,111,11 
0.022,222,22 
0.033,333,33 
0.044,444,44 
0.055,556,56 

0 066,666,67 

0 077,777,78 

0 088,888,89 

-0.0835,273,369 
-0,070,546,737 
-0.070,582,011 
-0.074,109,348 
-0 077,636,684 
-0.071,164,021 
-0.074,691,358 
-0.078,218,695 

0.074,091,711 -0 0 77.814,833 
0.0821,164,021 -0.071,401,493 
0.0850,264,550 -0 0<28,8G0,029 
0.0891,710,758 -0 .0854,157,091 
0.0814,550,265 -0 .0887,292,680 
0.0821,164,021 -0.Q812,826,680 
0,0829,012,346 -0.0817,707,944 
0 0838,095,238 - 0.0823,373,061 



3 

6 

7 

1 

0.070,087,459 

-0 0«38,065,882 

2 

0.073,059,860 

-0.0878,306,957 

3 

0 079,870,764 

-0 0835,414,321 

4 

0,072,515,888 

-0,0«96,195,746 

5 

0 075,264,163 

-0 0820,323,918 

6 

0 074,603,642 

-0.0‘36,960,883 

7 

0.071,905,926 

-0 0'60,836,892 

8 

0.0810,854,319 

-0.0'93,258,365 


a 


0 0n4,772,299 
0.0«67,379,G07 
0.0’37,246,805 
0.0»13,039,809 
0.0'33,614,797 
0.0'72,070,037 
0 043,654,992 
0.0*23,672,301 


0 

"-0.0"47,770,889 
-0.0« 32.240,604 
“0.08 20,934,251 
-0.0’ 10,793,811 
“0.0» 30.234,979 
-0.0^)8,563,784 
~0.0» 13,520,252 
-0.08 24,174.891 


3 


1 

2 

3 

4 

5 

6 

7 

8 



1 

i 

' 

10 

11 

12 

13 

0 0’74,640,444 
0.0“18,330,114 
0.08 21,506,614 
0 0®10,849,591 
0.0® 36,260,639 
0 0®95,092,297 
0.08 21,247,442 
0.08 42,365,199 

-0.0*70,268,872 
-0.0*891,361,579 
-0.0*74,469,203 
-0.0*887,178,260 
-0 0®32,719,538 
-0,0® 93,120,388 
-0 0*22,112,968 
-0.0*46,218,579 

0.0*810,369,029 
0,0*843,595,840 
0.0*890,601,910 
0.0**72,767,557 
0,0*®32,219,900 
0.0®10,472,881 
0,0®27,825,332 
0.0®64,147,144 

-0.0*824,535,539 
-0.0*79,132,452 
-0.0**58,727,028 
-0.0**54,213,617 
-0.0**27,049,719 
-0.0**90,020,717 
-0.0*827,369,553 
-0 0*“66,862,484 



DISTRIBUTION OF MIDRANGE 


105 


Using (12) the probability integral for W can be evaluated with the help of 
tables of Incomplete Gamma Functions. 

REFRRENCES 

[1] L H C, Tippet, "On the extreme individuals and the range of samples taken from a 

normal population," Biometnb, Vol 17 (1926) ipp 364-387. 

[2] STtiDENT, "Errors in routine analysis/’ Biomelnka, Vol 19 (1927), pp. 161-164. 

[3] T HoJO, "Distribution of the median, quartiles and interquartile distance in samples 

from a normal population," Biomelnka, Vol. 23 (1931), pp 316-360. 

[4] E. 8. Pearbon and H. 0 Hartley, "The probability integral of the range m samples 

of n observations from a normal population," Biometnka, Vol 32 (1942), pp. 
pp, 301-310. 

[5] E. J, Gumbbl, "Ranges and midranges," The Annak of Math. Slat,, Vol. 15 (1944), 

No. 4, pp 414-422 

[6] E. J, Gumbel, "On the independence of the extremes in a sample," The innola of 

Math, Slat , Vol. 17 (1946), No. 1, pp. 78-80. 

[7] E J. Gumbel, "The distribution of the range," The Annals of Math, Stal., Vol. 18 

(1947), No. 3, pp. 384-412. 

[8] G. Elfvinq, "The asymptotical distribution of range in samples from a normal popu¬ 

lation,” Biomeirika, Vol. 34, (1947), pp 111-119 

[9] J. F. Daly, "On the use of the sample range in an analogue of Student’s /-test," The 

Annals of Mall Slat., Vol 17 (1946), No. 1, pp. 71-74 

[10] E Lord, “The use of range in place of standard deviation in the t-test," Biomelnka, 

Vol 34 (1947), pp. 41-67. 

[11] J E Walsh, "An extension to two populations of an analogue of Student's f-test 

using the sample range," Annals of Math. Slat,, Vol. 18 (1947), No. 2, pp. 280-286, 

[12] S B.Wilkb,"O rderStatistics,"Am Math.Soc Bull,Vol 54 (1948),pp.6-50. 

[13] S, S Wilks, Mathematical Statistics, Princeton University Press, Princeton, 1943, 

p. 90. 

[14] K C, S. PiLLAi, "A note on ordered samples," Sankhya, Vol. 8 (1948), Part 4, pp. 

375-380. 



THE IMPOSSIBlLITy OF CERTAIN SYMMETRICAL BALANCED 
INCOMPLETE BLOCK DESIGNS 

By S. S. Shrikhandb 
University of North Carolina 

Introduction and Suniinaiy< An arrangement of v varietioH or treatments in 
b blocks of size k, (k < v), is known as a balanced incomplete block deHign if 
every variety occurs in r blocks and any two varieties occur together in \ blocks. 
These parameters obviously satisfy the equations 

(1) bk = vr 

(2) X(ii - 1) = r(fc — 1). 

Fisher [I] has also proved that the inequality 

(3) b>v, r>k 

must hold. If V, b, r, k md \ are positive integers satisfying (1), (2) and (3), 
then a balanced incomplete block design with these parameters possildy exists, 
but the actual existence of a combinatorial solution is not ensured. The«; condi¬ 
tions are thus necessary but not sufficient for the existence of a design, Fisher 
and Yates in their tables [2] have listed all designs with r < 10 and given com¬ 
binatorial solutions, where known. A balanced incomplete block design in which 
h = V, and hence r == fc is called a symmetrical balanced incomplete block 
design. The impossibility of the symmetrical designs with parameters v ESS ^ RES 22, 
r = k = 7, \ = 2 and « = b = 29, f = jfc = 8, \ = 2 was first demonstrated 
by Hussain [3], [4] essentially by the method of enumeration. The object of the 
present note is to give an alternative simple proof of the impossibility of these 
designs and to show that the only unknown remaining symmetrical design in 
Fisher and Yates’ tables, viz. v = 6 = 46, r = fc = 10, X = 2, is definitely im¬ 
possible. Symmetrical designs with X < 6, r, fc < 20, which are imposnble 
combinatorially, are also listed. 

1. A necessary condition for the existence of a symmetrical balanced incom¬ 
plete block design when v is even. 

Theorem 1 . A necessary condition for the emslence of a symmetrical balanced 
incomplete block design with parameters v, r and X, where v is mn, is thal r — 'X 
be a perfect square. 

Proof. Let N = {nij) be a square matrix of w rows and v columns whore 

nij = 1 or 0 

according as the i-tb treatment does or does not occur in the j-tb, block. Put 

(5) B = NN' 

106 



IMPOSSIBILITY OF CERTAIN BLOCK DESIGNS 


107 


Since every treatment occurs in r blocks and every pair of treatments in X 
blocks, we have, if the design is possible, 


( 6 ) 



Subtracting the first column from all the other columns and then adding to 
the first row all the other rows, we see that 


(7) 


1 E I = [r + X(p - l)](r - X)'-' 
= r“(r — X)*“^from (2). 


But from (6) 


Since j JV | is integral, it follows that (r — X)*"^ is the square of an integer, and 
hence if « is even, r — X must be a perfect square. 

Corollary. The following aymmetrical designs are impossible. 


(AO 

t) = 6 = 22 

(AO 

V = b = id 

(Aa) 

V = b = 92 

(AO 

V = b = m 

(As) 

II 

II 

(A.) 

y = !) = 34 


r = /c = 7 X = 2 

r = fc = 10 X = 2 

r = k = li X = 2 

r = A:=15 X = 2 

r = /c = 19 X = 2 

r = fc = 12 X = 4. 


As already mentioned in the introduction, the impossibility of (Ai) has been 
proved by Hussain [3], but for the design (Az) it was hitherto unknown whether 
or not a solution is possible and it was left as a blank in the latest edition of 
Fisher and Yates’ tables. 


2. Application of method of Bruck and Ryser. 

In a recent paper Bruck and Ryser [5] have proved the impossibility of some 
finite projective planes with the help of the properties of matrices whose ele¬ 
ments are integers. Their method is immediately applicable to our own problem. 

Let A and B be two symmetric matrices of order n with elements in the ra¬ 
tional field. The matrices A and B are congruent, written A ~ B, provided 
there exists a nonsingular matric C with elements in the rational field, such that 
A ^ C'BC. The congruence of matrices satisfies the usual requirements of an 
‘ ‘equals’ ’ relationship. 

If A is an integral symmetric matrix of order n and rank n, we can always 
construct an integral diagonal matrix D = (di, ■ • • , dn), where d. 0, i = 1, 
2, n such that D A The number of negative terms i, called the index 
of A, is an invariant by Sylvester’s Law. 



108 


S. S. SHUIKHANDE 


Define d - (—1)‘6 -where 6 is the square-free positive part of } A (. 'riieri siuee 
IJ5 I = I C 1“ 1 A 1, d is another invariant of A 
Now let A be a nonsing;ular and symmetric, integral matri.x of onler n. I^et 
Dr be the leading principal minor determinant of order r and suppose that 
Dr 5^ 0 for r = 1, 2, • ■ • 71. Define 

(9) 0,(A) - (-1, -D„), il (^/. - 

i-i 

for every odd prime p where (m, m')p is the Hilbert norm-n'sidue symbol for 
arbitrary non-zero integem m and m' and for every prime p. 'rim following 
two theorems are given in the collected works of Hilbert [()]. 

Theorem (A) 7/ n and m' are integers not divisible by the fxid prime p, t/im 

( 10 ) {m,m% = +l 

(11) (w, p)f, == (p, m), = (m/p), 

where {m/p) is the Legendre symbol. Moreover, if nt ss m' ^ 0 mod p, then 

(12) (m, p)p = (m', p)p . 

Theorem (B). For arbitrary non-zero integers m, m', n, n' and for every prime, p, 

(13) (-m, m)p = -fl 

(14) (m, n)p = (n, m)p 

(15) (.mm\ 7i)p = (m, n)p(m', n)p 

(15) (^. nn% = (m, n)p(m, rt'), . 

From the above it is easy to prove that for p an odd prime and every positive 
integer m, 

(17) 

(18) 


(m, m -f l)p = (-1, m -b l)p 

m 

n (j,j + l)j = ((m + 1)'., - 1), . 


We can now state the fundamental Minkowski-Hasso Theorem [7], 

Theorem (C). Let A and B be two integral symmetric matrices of order n and 
rank n. Suppose further that the leading principal minor determinants of A and B 
are di^erent from mo. Then 4 - 5 7 ./ and only if A and B have the same invan- 
ants i, d and C, for every odd prime p, 

3. A necessary condition for the existence of a symmetrical balanced In¬ 
complete block design for any integer t». 

Suppose the symmetrical design with parameters 0 , r and k exists, Then 
with the previous definition of N and B. 


B = NN' = 


X r 


,X X 


ri 



IMPOSSIBILITY OF CERTAIN BLOCK DESIGNS 


109 


Subtracting the last row from the remaining rows and then subtracting the last 
column from all the other columns, we get 


(19) 


2(r — X) (r — X) ■ • • (r — X) — (r — X) 

(r — X) 2(r — X) • ■ • (r — X) — (r — X) 

(r - X) (r - X) • - • 2(r - X) - (r - X) 

— (r — X) — (r — X)-(r — X) r 


Obviously Q B. But B I. Hence Q I and, therefore, since Q and I satisfy 
all the conditions of Theorem C, they must have the same invariants i, d and Cp . 
Let Dj denote the leading principal minor determinant of Q of order j. Then 

(20) Z), = (r - X)^0’+ 1) forj = 1, 2, ,y - 1 

(21) and D, = \ B\ — r{r — X) 

Then, omitting p for convenience, 

CpiQ) = (-1, -D„)(Z)^i, -Z>„) n (Z)y, - A+i). 

i“i 

We use (10) • , (18) in deriving the value of Cp{Q). 

Now 

( —1, — Z)c)(Z)f_i, Z)«) 

= (-1, -r’(r - X)«-^)((r - X)'-‘p, -r\r - X)”'*) 

= (_i, ,=)(_!, (, _ xy-^){{r~ \y-\ r^) 

((r - X)’-‘, -(r - X)’-*)(ti, r^Kv, - (r - X)’-‘) 

= (-1, (r - ~ ir-\y-^) 

= (-1, (r-X)’-^)(ii, -l)(i;,(r-X))’-\ 

Also 

n (A’, - A+i) = n ((r - X)^(J- + 1), - (r - X)^^(; + 2)) 

;-l 1 

= {n ((»■ - X)'. -ir- X)^')(j + 1, -(; + 2))| S 

= s n (ir - X)', -(r - x)0((r - X)^, (r - \)){j + 1, j + 2)0’ +1,-1) 

1 

= iSn Hr - X), (r - \)yu + 2, -1)0 +1,-1) 

1 

= -sH (r - X, -1)^0 + 2, -1)0+1, -1) 

1 

= Sir - X, - ((t; -1)1, - i)(„|, -1) 

= Sir - X, -1), 



no 


S. 8. SHBIKHANDK 


where 

S = n((r-Xni+2)((r-X)'«i-t- D 
1 

= n ((r - + 2)((r - \r\j f 1) 

1 

= n((r-X)',J + 2)II((r^\)',i4 2) 

1 j—0 

= (r- X. ur-*. 

CM = (r - X. ~ (u, -l)’(r - X. 

(22) = (r - X, - - X, i;)**"*. 

Hence we can enunciate the following theorem: 

Thbohem 2. A necessary condition for the existence of a symmelrical haUmc^ 
incomplete block design with parameters v, r and X is that 

CM = (r - X, - X. v)V"’ - +1 

for all odd prime p, where (m, w), is the Hilberl norm-residtte syttdwL 

When V is even we have seen that a necessary condition for the existencft of 
the design is that r — X be a perfect square. Then it is caaily seen that 

CpiQ) "= +1 

for all odd prime p. Therefore, even if the design is really non-existent, its im¬ 
possibility cannot be proved by this method. 

When, however, a is odd we can in many instances demonstrate the impossi¬ 
bility of the design. 

Consider the design 

(d.?) ' V = h = 29, r = k = 8, X » 2. 

Cp{Q) = (6. -l)y “(6. 29)V 

= (3, 29),(2, 29), 

= (29/3) for p = 3 
= (2/3) for p = 3 
= — 1 forp = 3. 

Hence the design (4?) is impossible. As mentioned in the intrcnluction, the 
impossibility has already been demonstrated by Hussain [4] by a rather lengthy 
method amounting to a complete exhaustion of all possibilities. The following 
designs with X < 6 and r, k < 20 can be similarly proved to be impossible by 
applying Theorem 2. 


(do 

CO 

11 

II 

r = k ^ 17 

X = 2 

(dO 

CO 

11 

r-O 

tl 

r = k = 12 

X = 2 

(Am) 

« = 6 = 103 

r = k = 18 

X = 3 

(dll) 

V = b = 53 

r = k = 18 

X = 3 

(dia) 

V = b = 43 

r = k = 15 

X = 5 

(dia) 

V = b = 77 

r = k = 20 

X = 5 



IMPOSSIBILITY OF CERTAIN BLOCK DESIGNS 


111 


My thanks are due to Professor R. C. Bose under whose guidance this re¬ 
search was carried out. 


REFERENCES 

[1] R. A. PiBKEB, "An exeimination of the different poBBible BolutionB of a problem in in¬ 

complete blocks,” Annah of Eugenics, Vol 10 (1940), pp. 62-76. 

[2] R. A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical 

Research (1949), Hafner Publishing Company, New York. 

[3] Q. M. Hhbsain, "ImpoBBibility of the symmctrioBl incomplete block design with N •= 2, 

k - 7,” Sankhyd, Vol. 7 (1946), pp. 317-322. 

[4] Q. M. Hussain, “Symmetrical incomplete block design with X 2, fc = 8 or 9,” Bulletin 

of the Calcutta Mathematical Society, XXXVII (1945), pp. 116-123. 

[6] R. H. Bbuck and H. J. Rtbbr, “Non-existence of certain finite projective planes,” 
Canadian Jour. Math., Vol. 1 (1949), pp. 88-93. 

[6] D. Hilbert, Oesammelle Ahhandlungen I (Berlin, 1932), pp. 161-173. 

[7] H. Habbb, “t)ber die Xquivalenz quadratischer formen im korper rationalen zahlen,» 

J. Reine Angevi. Math., Vol 162 (1923), pp. 205-224. 



NOTES 

This seclim is devoted to brief research and exposiionj articles and other short items. 


THE SAMPLING DISTRIBUTION OF TH& RATIO OF TWO RANGES 
FROM INDEPENDENT SAMPLES^ 

Richakd F. Link 

Univeraiiy of Oregon 

Let us consider a sample of n ordered observations (xi < Xa < • ■ • < J„) 
drawn from a population with variance <t. Let w = (xn — xfj/tr. Ixit us consider 
the joint sampling distribution of m and lUafor two samples, not necessarily the 
same size, drawn from populations with the same variance. If the two Mimples 
were drawn independently, then the joint sampling distribution.s of Wi aiirl u'j 
may be written as the product of the sampling distributions of Wi and ics , 

If we make the change of variable r = wi/wz, Wj = w, and if lo is integrated 
over its range of definition, the cumulative distribution of the ratio of two ranges 
remains. This may be written as 

(1) F(R) =[ dr f dw'W‘hi(w)-ki(m), 

where hi is the pdf for Wi and hz is the pdf for Mj . 

To obtain more explicit results, specific distribution functions may he eonsid- 
ered. The following table gives the sampling distribution of the ratio of two 
ranges from independent samples for the indicated density functions/(i). Notice 
that for the normal distribution it was possible to obtain results only for some 
special cases. 

In Table 1 for F{R), w, and Wz represent ranges computed from samples of 
size ni and nz respectively. 

Notice that formula (1) for F(R) ia equivalent to the following expressions 

divz / dwih{v>i)’h{vh)> 

■'0 

The region of integration for the last expression is simply the region in the 
Wz, wi plane to the right of the line = Rwz. 

This integration was done numerically. Table 2 gives values of R for all com¬ 
binations of ni and nz < 10 and for a = .005, .01, .025, ,05, .10 such that 

Pr{wj/wz < E) = a 

where wi and wz are ranges computed from samples of size Ui and n, drawn from 
* This work was done under contract N6onr-218/iv with the Office of Naval Research, 


112 



KA.TIO OF TWO RANGES 


113 


normal populations with the same variance. It is believed that these values are 
correct to within one place in the last reported figure. 

These tabled values may be used as critical values for testing the hypothesis 
that two independent samples were drawn from normal populations with the 


same variance. This test 

is therefore comparable to the F test. Some sort of 

TABLE 1 

fix) 

FiR) = Pr (w,M < R) 

1 0 < a: < 1 

n(n iin-'-if Cm 4-film-lb 

0 all other x 

^ ^ [_ni +712 — 2 711 + 712 — 1 

1 Rim -1)' 
m H- ni 

e~^ 0 < a: < 00 

0 2 < 0 

f- !)<+' 

[ 1 -iy-f (i-hfiiiifi-if) 

—^ 6 ”^*^ — « < o: < « 

V 2 ir 

2 

ni ■= 2 , na ” 2 - tan~^ R 

T 

..6 ^ R 

m = 2, nj = 3 — tan ‘ —a- 

V4-t-3B* 

m = 3, m ■= 2 ^ A /3 4 . 4^2 - 

ni = 3j na “ 3 

f” r27r r 2 

/ dr — <—iu tan”’ u — v tan”’ v) 

Jo V’ 

1 u^y 

-\ --- iw tan”’ TV ~ u tan”’ w) -i- - tan' ’ 2ry ^ 

0r’(l -f- r’l r j J 

where 

u - (3(r» d- l)]-i tu - (7r> + 3)-‘ 

V - (4r’ + 3)-* y - (3r» -f- 4)”‘ 


measure of the relative performance of these two tests seems desirable. An at¬ 
tempt to measure the performance of this test relative to the F test was made by 
comparing the tolerance intervals of the distribution of this ratio with those of 
the F test. 

The length of the interval containing the central 1 — 2 q: proportion of the 
distribution of F was compared with a similar length for the distribution of 
W 1 /W 2 for rii = rii = n. The square of the ratio of these lengths will be called 6a . 










114 


BICHARD P. LINK 


TABLE 2 


Pr(^ <B^ = .005 



2 

.0078 

.0052 

.0043 

.0039 

.0038 

0037 

.0036 

.0035 

.0034 

3 

.096 

.071 

.059 

054 

.051 

.048 

.045 

.042 

.041 

4 

.21 

.16 

.14 

.13 

.12 

.12 

.11 

.11 

.10 

5 

.30 

.24 

.22 

20 

.19 

.18 

.18 

.17 

.16 

6 

.38 

.32 

.28 

.26 

.25 

.24 

.23 

.22 

.22 

7 

.44 

.38 

.34 

32 

.30 

.29 

.28 

.27 

.26 

8 

.49 

.43 

.39 

.36 

.35 

.33 

.32 

.31 

.30 

9 

.54 

.47 

.43 

.40 

.38 

37 

.36 

.35 

•34 

10 

.67 

.50 

.46 

.44 

.42 

.40 

.39 

.38 

.37 




Pr( 

< R 

1 = .01 






2 

3 

4 

6 

6 

7 

8 

9 

10 


.0157 

.0106 

.0080 

.0070 

.0068 

.0066 

.0063 

.0062 

.0061 

.136 

.100 

.084 

.079 

.073 

.069 

.066 

.062 

.060 

.26 

.20 

.18 

.17 

.16 

.16 

.14 

.14 

.13 

.38 

.30 

.26 

.24 

.23 

.22 

.21 

.21 

.20 

.46 

37 

.33 

.31 

.29 

.28 

.27 

.26 

.20 

.53 

.43 

,39 

,36 

.34 

.33 

.32 

.31 

.30 

.69 

.49 

.44 

.41 

.39 

.37 

.36 

.35 

.34 

.64 

.63 

48 

.46 

.43 

.41 

.40 

.39 

.38 

.68— 

.57 

.62 

.49 

.46 

.45 

.43 

.42 

.41 



2 .039 .026 . 019 . 018 .017 . 016 . 016 . 016 . 016 

3 . 217 .160 .137 .124 .116 .107 .102 . 098 . 096 

4 .37 .28 .26 ,23 .21 .20 .19 .18 .18 

6 . 60 .39 .34 .32 . 30 . 28 . 27 .26 .26 

6 . 60 .47 .42 . 38 .36 . 34 , 33 ,32 . 31 

7 .68 .64 .48 ,44 .42 .40 .38 ,37 .36 

8 .74 . 59 . 53 .49 . 46 .44 . 43 . 42 .41 

9 . 79 . 64 . 57 .63 . 60 .48 . 47 . 46 . 44 

10 I .83 .68 . 61 .57 . 54 , 52 , 50 . 49 ,48 





RATIO OV TWO RANGES 


115 


TABLE 2— Continued 


Pr(^ <1^ = .05 


»* 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

.079 

.052 

.039 

.036 

.034 

.032 

.031 

.030 

.028 

3 

.31 

.23 

.20 

.18 

16 

.15 

.14 

.14 

.13 

4 

.60 

37 

32 

.29 

.27 

.26 

.25 

.24 

.23 

5 

62 

49 

.42 

.40 

.36 

.35 

.33 

.32 

.31 

6 

.74 

.57 

.50 

.46 

.43 

.41 

.40 

.38 

.37 

7 

.80 

.64 

.57 

.52 

.49 

47 

.45 

.44 

.43 

8 

.86 

.70 

.62 

57 

.54 

51 

.50 

.48 

47 

9 

.91 

75 

.67 

.61 

.58 

.65 

53 

52 

.61 

10 

.95 

.80 

.70 

.65 

.61 

.59 

57 

.55 

.54 




Pr( 

\Wi ) 

1 = .10 





“« 

ni 

2 

3 

4 

6 

6 

7 

8 

0 

10 

2 

.168 

.105 

.077 

.074 

.069 

.066 

.062 

.059 

.056 

3 

.46 

.33 

.28 

.25 

.23 

.22 

.21 

.20 

.19 

4 

.67 

.49 

.42 

.38 

.36 

.34 

.32 

.31 

.30 

5 

.84 

.62 

.53 

48 

.46 

.43 

.41 

.39 

.38 

6 

.97 

.72 

.62 

.56 

52 

.50 

.48 

.46 

.45 

7 

1 07 

80 

.69 

.63 

.69 

.56 

.54 

.62 

.50 

8 

1.16 

87 

.75 

68 

.64 

.61 

.68 

.56 

.54 

9 

1 21 

92 

.80 

.73 

.68 

.65 

.62 

.60 

.58 

10 

1.26 

.98 

.85 

.77 

.72 

.68 

.66 

.64 

.62 



TABLE 3 


n 

Relative prectsion of the range as an 
estimate of <r 


2 

1.00 

1.00 

3 

.99 

.99 

4 

.98 

.97 

6 

.96 

.96 

6 

.93 

.92 

7 

.91 

.90 

8 

.89 

.89 

9 

.87 

.88 

10 

.85 

.86 













116 


FRANK J. MASSEY, JR. 


For statistics having normal sampling distributions such a ratio would bo in¬ 
dependent of oL and would be equivalent to the ratio of the vananceH of these 
sampling distributions. It was found that sl is independent of a except for a 
maximum change of 1 in the second decimal for the values of « - ,{ll, 

.025, .05, .lo! These values of 6® are presented in Tabic 3 along with tlie relative 
precision of the range as an estimate of <r as given by Mosteller (Ij. 

It is interesting to note that i corresponds very closely to the n'lative precision 
of the range as an estimate of «r. 

REFERENCE 

[1] F, MoBTBnwia, "On aome useful 'inefficient’ atatiatioB,” jlnnats o/ ^^aCh, Vol. 17 
(1946), pp. 377-^08. 


A NOTE ON THE ESTIMATION OF A DISTRIBUTION FUNCTION BY 

CONFIDENCE LIMITS 

By Prank J. Massey, Jr. 

University of Oregon 

Let Fix) be the continuous, cumulative distribution function of a random 
variable X, and let xi < aij < < • • • <.z„ be the results of n independt'nt 

observations on X arranged in order of size. We wish to estimate F(x) by means 
of the band Snix) ± X/Vn where ;S„(5t:) is defined by 

0 if * < ail, 

Snix) = k/n itxk< X < Xhi , 

1 if X > Xn . 

Thus we wish to know the probability, say Pn(X), that the band is such that 
'S'n(x) - < Fix) < Snix) -f ;;^for all x. This problem him been pre¬ 

viously studied [1] [2] [3] [4] [5] and a limiting distribution has been obtainesi 
[l] [4] [5] and tabled [3] [4], However apparently no error term-s for the limit¬ 
ing distribution, or practical methods of obtaining P„(X) have been given. Such 
a method la given here. 

It has been shown [2] that Pn(X) is independent of P(x) provided only tliat 
F{x) is continuous, and thus it is sufficient to consider only the case 

0 if X < 0, 

Fix) = xi£0<x<l, 

1 if X > 1. 

We will find t^ probability that S„(x) falls wholly in the band Fix) d= k/n 
(here X - k/^/n) where fc is an integer or a rational number, and intermediate 
values may be obtained by interpolation. To illustrate the method we shall 
assume that k is an integer. 



ESTIMATION OF A DISTRIBUTION FUNCTION 


117 


Divide the interval (0, 1) inton parts by the points 1/n, 2/7i, • ■ , (n — l)/n. 
The step function rises by jumps of exactly 1/n. Thus, in order to be 
inside the band at x ~ i/n, Sn(x) would have to pass through exactly one of the 
lattice points whose ordinates are (i — k + l)/n, (i — k + 2)/n, ■ • • ,U + k — l)/n. 
Suppose that the step function stays inside the band by means of a, of the 

observations falling in the interval = 1.2, The a priori 

\ ji n/ 

probability of this happening is given by the multinomial law as 


Pr(ai •••«„) = 


ai I 


Tri _A /I... Ay 

!---a„!\7i,/ W [nj 


1 


ni 


ai! • • • Q!rt! n" 

since X." «, = 

Thus the probability of the step function staying in the band is given by 

n! 1_ nl 1 

• • Ofnl 


Pn(\) - X 


n” ail a^i 


_ 7l\ 

n" ai! 




where the summation is over all possible combinations of ai , • ■ • , such that 

X n 

max 1 Sn(x) - X 1 < —/=- and X “. = «■• 

, "V n i-i 

Let Ui(m) = Xi — 1, 2, • • • , 2k ~ 1 be the sum of all the 

" ail • • • «„!' 

terms indicated such that 5„(x) arrives at the lattice point ( —, -IAT-.M by 

\n n / 

a route that stays inside the band. Since the Sn{x) is non-decreasing it can only 
pass through a point 


( 


iii), 


m = 0, 1, • • ■ , n — 1; j = \,2, • ■ ■ ,2k ~ 1, 


m — k -p 3 + I 
n 


m -t- 1 w — fc + 
n ’ n 

if it previously passed through one of the points 

/m m — k + l\ /m m — k + 2\ fm 
\n ’ n / ’ \n ’ n ) ’ \n ’ 

If it passed through the value of Um+i would have to be 

O' -f 1 " h) and the product Uh(m) would be part of C7y(m + 1)- 

This is tnie for all ft = 1, 2, • • • , i -|- 1 and all of these terms would give dif¬ 
ferent paths for A'n(x) so we have 

1 


,+i 

V(« + 1) - g 


Uhim), j = 1,2, ■ ■ ■ ,2k ~ 1, 


where it is understood Uh{m) = Qyih'>m + k. 



118 


FRANK J, MASSEY, JR. 


Thus we have a set of 2fc - 1 linear homogeneous difference eijuations. They 
may be reduced to a single difference equation by eliminating 2k — 2 of the 
variables by substitution. This results in the following difference equatifjn. 


X (- D *um - 1 - h 4* m) - 0. 
n-i ft I 


TABLE 1 


k 

n 

-6 

10 

20 26 30 

36 

40 

46 

1.0 

.0384 

.0004 





1 5 

,3276 

.0449 





2.0 

6521 

.2513 

.0238 




2 5 

.8880 

5139 





3.0 

.9699 

.7331 

.2955 




3.6 

9947 

.8522 





4,0 

.99935 

.9410 

.6473 




5.0 


,9922 

.8624 .7637 ,6629 

.5074 

,4808 

Am 

6.0 


.9994 

.9669 .9057 .8420 

.7725 

.7016 

.0322 

7.0 



.9892 . 9683 . 9350 

.8945 

.8471 

7902 

8.0 



.9979 9911 .9774 

.9560 

.9295 

8974 

9.0 



.9997 .9979 ,9931 

,9842 

9708 

.9529 








k 

n-60 

66 

60 66 

70 

76 

80 

5 0 

.3377 

.2807 

.2324 .1918 

.1677 

.1294 

.1000 

6.0 

5662 

.6046 

.4478 .3954 

,3492 

.3072 

.2090 

7 0 

,7439 

.6916 

.6403 .5908 

.5435 

.4987 

.4.506 

8.0 

,8616 

8234 

7837 .7434 

.7031 

.0033 

0244 

9 0 

.9312 

9063 

.8789 , 8496 

.8189 

.7874 

.7Sfhi 


Initial conditions on either the simultaneous equations or on the 
tion are 


single equR' 


Ui{Q) = Q iot i k, 

Uk{0) = Hot i a k. 

After values of Uk{n) have been found the value of can be found by 

niultipl}ring Uk(n) by — . 

n" 

values of C/*(n) can be obtained numerically either from the simultaneous 


t 



I-MIMATION' OF A DISTRIBUTION FUNCTION 


119 


ecjUttttoiih or fr«)ni the single eeiuiifion. Table 1 was computed partly by numerical 
solution of the KimultaneouK equations above and partly by setting up similar 
equations eonnreting J’,ii + 5) to C/,(x), i = i, 2, ■ ■ • , f + 5. Either method 
could he set up on punch cards if an extensive table was desired. Notice that 
to get (h(n) all t « 1, 2, ■ • ■ , n — 1 are also found. Table 1 gives some 
computed values of I‘n{k). Table 2 gives results interpolated from Table 1, 
showing (he approach of P„(X) to its limiting distribution. 

If the width of the hantl is 2 ^0 when k and I are integers a similar pro¬ 
cedure to that above can be used. However instead of dividing the interval 
(0, 1) into « parts it is necessary to divide it into Z-n parts. 


TABLE 2 


n 

X » 0 

1.0 

1.10 

1.20 

1,30 

1.40 

10 

.06 

.78 

.85 

.91 

.95 

.97 

20 

.05 

.77 

.86 

.91 

.94 

,97 

30 

.05 

.70 

.85 

.90 

.94 

.96 

40 

.04 

.76 

84 

.90 

.94 

.96 

50 

.04 

.75 

.84 




00 j 

.03 

,75 

.84 




70 1 

.03 

,75 

.83 




80 

.63 

.74 






.007 

.730 

.822 

.888 

.932 

.960 


It has Iw-n suggested (2) that instead of a band bounded by y = a; ± c it 
might be convenient to use a band bounded by the lines y = pa: + g and 
y = p'a; -b g'. If p ® p' and if p, g, g' are rational the probabDity of iS„(a;) staying 
inside the band can be evaluated by the method presented above. If p ^ p^ 
and if p, p', g, g' are all rational a similar procedure could be used but it would 
be veiy t^ious. 

[1] N. BwiaNov, '*Sur lea Courts de la courbo de distribution empirique," Recueil Malh. 
de Moaew, Vol. 6 (1039), pp. 3-26. 

[21 A. Wau> Mft> J. Wowowm, "Confidenca Umita for continuous distribution functions,” 
Annals o/MttfA. iSfttt., Vol. 10 (1939), pp. 106-118. 

[31 N. Smirnov, "On the eatimation of the diaoropanoy between empirical curves of dia- 
tributioa for two independent aamplea,” Bulletin Mathmaligve de I’TJnivereite 

de Moibou, Vol. 2 (1930), faao, 2. . 

[4] A. Koumoqouov, "Sulla determinasione emplrioa di una legge di distribunone, let. 

lUtl, all. Giorn, Vol. 4 (1933), p. 1“11. . . , j. , -u *■ i» 

[6] W. FBi-nra, "On the Kolmogorov-Smimov limit theorems for empirical distribution 

Annals of Modh. Slat., Vol. 19 (1948), pp. 177-190. 




120 


FEEDBHICK MOSTELLEB AND JOHN W. TUKEY 


SIGNIFICANCE LEVELS FOR A k-SAMPLE SLIPPAGE TEST* 

Frederick Mosteeler and John W. Tukey 

Harvard University and Princeton University 

1. Summary. Mosteller has recently [1, 1948] proposed a /c-sample slippage 
test and has given percentage points for selected ti, k and r for the cafK' of k 
equal samples of size n. When the samples are of unequal size, exact signifK’anci; 
levels can be calculated very quickly from 
y' ^r) 

Pr = where = x{x — 1) ■ • • (x — r + 1), 


by the method explained in section 3 below. 

The significance values for k equal samples of n > 10 are very well approxi¬ 
mated by 

where N = kn. 

A convenient rough approximation for unequal samples may he given in 
terms of k*, an “effective” number of samples, which i.s given by 

u* _ 

the one-sided significance level will then be approximately given by 

Pr = (A;*)-''-''*. 

This approximation can be easily applied with the aid of Table 1. Thus, for 
example, with four samples of sizes 7, 6, 6, 2, we have 

k* = (7 + 5 + 5 -b 2)^ _ 361 _ 

49 -1- 25 + 25 -1- 4 m~ ’ 

whence from the table r = 3 lies at a one-sided level approximately between 
5% and 10%, r = 4 approximately between 1% and 2.5%, r = 5 between 0.5% 
^ calculation yields 5.7%, 1,2%, 0.2% 

and 0.03%. The approximation is, in this example, quite satisfactory for mwier- 
ate significance levels and conservative for more extreme significance levels. 


2. Derivation. The statistic considered by Mosteller is the number of eases 

m one sample greater than all cases in all the /c - 1 other samples. We derive 
Its distribution briefly. 

Since the statisti c depends only on the order of the th + ns + • • • + r; ^ Jsf 
* Prepared in connection with research sponsored by the Office of Naval Research. 



,mfiNiri('\NrK lbvbi*s for suprage test 


121 


valuPH, wc‘ cun eniisidfr tlic actual value's taken on to be fixed, and consider their 
allotment to the N'arinUh Kunitles. Assuming all of them to come from a single 
continuous dislrilnitioii, we may consider these fixed values to be all distinct, 
and any way of allotting flietn to lalielled places in the various samples ns equally 
likely. 

Clomsider the r largest values. They can all be, allotted to places in the i-th 
sample in 7i.(n, “ 1) • • • (n, - r + 1) ways, and to arbitrary places 

in iV"'’ ways. Thus they will be allotted to some single sample in the fraction 

X> _ 

iycr) 

of all ca-'-es. This i.s clearly the probability that Moateller’s statistic is r or more. 

TABLK 1 


Approxiinnic critical values of k* for various levels of significance 


Ono-mtlod 
level' 
Two HI (loti > 
It'velj 

lO'-l 

Kflf) 

2 SCf, 

5T. 

1 

i 

0.5% 

1% 

0.2% 

0 4% 

0.1% 

0,2% 

r “ 2 i 

ID 0 

20.0 

40.0 

100.0 

200.0 

500.0 

1000.0 

O ^ 

r KB »■> , 

3 2 

•l.ft 

6.3 

10 0 

14 1 

22.4 

31 6 

r £K3 4 1 


2.7 

3.4 

4.6 

5,8 

7 9 

23 0 

r = f) i 



2,5 

3.2 

3,8 

4.7 

5,6 

r » fi J 




2.5 

2,9 

3.5 

4 0 

r 7 






2.8 

3.2 

r » 8 1 







2 6 


3, Unequal samples -an exact computation. Our practical problem is to 

compute Pr for small values of r and a fixed set of n ,. If we recognize the nu- 
raeratora as the uunormalized factorial moments of the distribution of sample 
sizes, we hoc that the c-omiiutation goes smoothly according to the scheme shown 
in Table 2 (where the columns of multipliers n — 1, n — 2, n — 3, etc. may be 
partially covered for eonvenienco during the computation.): For example: 
132 - 11(12), 1320 ~ 10(132), • ■ ■ 42 = 0(7). The numbers in the last line of 
Tallin 2 give Huceesaively thn percentages 100 , 100 Pj, ■ ■ • . Of course Pi = 

1 hneauso some samplo must have the largest value. It is clear that exact com¬ 
putation for any reoannahle set of is quite easy. 

4. Equal samples ■ an approximation. In the caae of k equal samples, we have 

Ivct ps try to approximate to by expansion in powers. We have 



122 


PREDEBICK M03TELLER AND JOHN W. TUKEY 


= nin — 1) • • • (n — r + 1) = n''(l — l/n)(l — 2/n) • • • (1 — (r — l)/n), 
so that 

r*—1 

log = r log n 4- S log (1 ~ x/n) 

i"l 
f —1 

= r log n — (^/n + x’^/2n^ + x’/Sn* • ■ 0 

1-1 

- rlogn — r(r — l)/2ft ^ r(r — l)(2r — l)/12n‘‘ 4- 0 ( 71 ”“), 

TABLE 2 
Sample Computation 

for (n.l = (12, 11, 11, 11, 10, 10, 10, 10, 9, 9, 7, 4) 









MuN'ini'ANcj, i.BVKiJs rtm huppage test 


123 


and finally 

(3) 




S. Coraparison of result®. 1'hp wults obtained with various equal sample 
approAinuitiuriK 'rtiH W* fornpared with the exact values for several cases. The 
effpctive tntm!«’r uf aainplpa, k*, tml with (1), (2), and (3), is computed from 


k* 


' &T' 


a fomuila which is often an eaay and effective way to allow for different sizes of 
samples. 


TABLE 3 

Comparison of Approximations 


1 

N ! 




P 

r in 


Siisee of t>amplP8 1 


. k r ■ 






! 

, 1 

' 1 

1 exact j 

(1) 

(2) 

(3) 

(4) 

10, 10, 10, 10 ! 

i 4o! 

4.(K) 2 

23.08 

25 00 

23.19 

23.13 

<25.00 

1 

1 

: 31 

4.85 

6.26 

4 99 

4 80 

<6.25 

7, 5. n, 2 

mi 


24.56 

28.53 

25.01 

24.82 

<28.63 

1 

! 

( 

3 

5.67 

8.14 

5.48 

5.18 

<8.76 

12, 11, 11, 11 ^ 

! 

' ! 

1 i 






10, 10, 10. 10 1 

IHl 

111.46 2 

7.92 

8 73 

7.96 

7 96 

<8.73 

9, 0, 7, 4 1 

1 

1 

0.58 

0.76 

0.58 

0,66 

<0.78 


A fourth apprrjxlmation, which always gives a conservative estimate of the 
significance of the result is obtained by replacing by n' throughout, this gives 


(4) 


Pr “ 


Sn? 

IV' 


} 


which is equivalent tti approximation. (1) when the samples are of equal size, or 
when r « 2. 

The results are almwn in Table 3. 

I'liua it seems clear that either (1) or (4) are good enough for rough work. 
The choice will depend on which formula one prefers to remember. The amount 
of work is about the same for either method. When something better is required 
the exact method of section 3 seems appropriate. Indeed some may prefer it to 
any appro.ximation. 

reference 

[1] Frkdbrick Mostbli-BR, "A fc-sample slippage teat for an extreme population,” Annalt 
of Math. Slat., Vol, 19 (194S), p. 6^5. 



124 


JACK SHERMAN AND -Vl’INIPHED J MORRISON 


ADJUSTMENT OF AN INVERSE MATRIX CORRESPONDING TO A 
CHANGE IN ONE ELEMENT OF A GIVEN MATRIX 

Bt Jack Sherman and Winifred J. MoiuasoN 
The Texas Company Research Laboratories, Beacon, New York 

1. Introduction. Many methods have been published in recent years for carry¬ 
ing out the numerical computation of the inverse of a matrix []], [2J, in all theses 
methods, the amount of computation increases rapidly ivith increase in order 
of the matrix 

The utility of a computational method for obtaining the inverse of a matrix 
would be increased considerably if the inverse could be transformed in a simple 
manner, corresponding to some specified change in the original matrix, thuK 
eliminating the necessity of computing the new inverse from the beginning. 
The problem that is considered in the present paper is one of chaiiging one ele¬ 
ment in the original matrix, and of computing the resulting changes in the 
elements of the new inverse directly from those of the old inverse. 


2. Computational method, Let 

,i ~ I, 2, ■ • ■ , n, j = 1, 2, • • • , n denote the elements of an nth order 
square matrix a; 

h,i, denote the elements of b, the inverse of a; 

A,; ,denote the elements of A which differs from a only in one clement, say jUh; 
Bij, denote the elements of B, the inverse A. 

Let 

Afifl = a«s -}- bo/iB. 

The set of equations by means of which B may be computed from Aa«j, 
and b is 


( 1 ) 


Bfj — brj 


bra bsj A Uss 


r = 1, 2, • • ■ , n. 


1 + bsaACaa ’ J = 1, 2, • • , n, 
provided that 1 + bsjsAaRg 5 ^ 0. 

The validity of equation (1) may be demonstrated by multiplying through 
by A,r, (r = 1, 2, ■ ■ • , n) and adding the results: 

i A,rBr, = ± A,r hr, - t A,r bra , 

(2) '"1 '■“i l + hs«Aa«»^:a ’ 

(i == 1, 2, ••• ,n;; = 1, 2, , n). 

Consider separately the equations for which i R, and for which i = R 
Casel. i R. By hypothesis, A., = a^forr R. Hence equations (2) become 

( 3 ) T A' B = T n. h bg,A aas 

^ ^ -Orj Aj 0,rr Or] ~ n —r-?-^- 2_, Air ^rK , 


r»>l 


1 "f A ttjzs r-l 


(i 1,2, ■ ■ -jR — l, R-j-l, .. ,^n;j = 1,2, ■,n). 



AnJt'ftTMKXT OF AN INVfIRSK MATRIX 


125 


I'hp laj-t f'tiin vanishes hFraiifif a and b am inverse matrices, and hence 

n « 

(4) “ '^h 

,«! ,-1 

(i - 1,2, ■ ■ • , ft — I, ft 4- 1, ■ • ■ , n; i = 1,2, • • •, n). 
('ahf* 11. i ft. FAiiiiitmiv (2) l^eromea 


(a) E.'U./ft 


#^<•1 


Hf IV; 


hai A flsa , 
i + A Oft* r**! 


O'= 1 , 2 , • • • ,n) 


In (‘acli nf fhf* .‘^uinmafions, tiim* will be a term for which r = ft, in which 
ca.sc' Aan n».^ 4 An«a . In all other cases, Aar = asr • Hence (5) can be written as 

y . /Isrft.j *- ^Harbrj 4 A a«a 

r«l C"! 

/p\ 

" ( 1 (l/trbrH 4 A aasbaa^ (j = 1, 2, • • ■ , n). 

\1 4 hfl« A ttfla/V-i / 

Hinc(* a and b are inverse matrices, the second summation on the right-hand 
side of (()) ia etitial to unity, and hence (6) becomes 

(7) 23 “ £ **<• 0 ~ 2, ■ • ■ , n). 

r-l r**l 

I'lio wits of (Tiuations (4) and (7) can be written as one set of equations: 

(8) 23 j4,rftr/ ~ 23 dtrbr/ (f = 1> 2, • • • , ti; j = 1, 2, • ■ • , 

r-l f -1 

and hence B ia the inverse of A. 


3* Illustrative numerical example. In actual applications, equations (1) are 
conveniently subdivided into three groups, namely, those for which r = ft, 
those for which j ft, and all others. In the first two cases, these reduce to 


(9) 

Ba> = 

0 = 1.2,- 

1 + bas A Qrna 

• ■ ,n), 

(10) 

= 

__ ha -, (r = 1, 2, • 

1 4 has A aaa 

• • , n). 

By utilizing (10), (1) becomes 




lirj hr/ ~ 

ftrfihsjAOas, 


(11) 

(t = 1, 2, ' 

, ft - 1, ft 4 1, 



j “ 1) 2, ■ 

• • • , ft — 1, ft 4 1, ■' ■ ) n'). 



Equations (10) and (11) show that the elements of B contained in the ftth 
row and ftth column are directly proportional to the corresponding e emen s o 



126 


JACK SHERMAN AND WINIFRED J. MORRISON 


Consider 


/2.384 

1.238 

0 861 

/ 0.648 

1.113 

0.761 

® Il.ll 9 

0.643 

3.172 

\0.746 

2.137 

1.268 

The inverse of b turns out to be 



2.413\ 

0 137] 
1.139 r 
0.542/ 


/ 0.2220 2.5275 -0.1012 
/ -0.04806 -0.2918 -0.1999 
1 -0.1692 0.01195 0.3656 
\ 0.2801 -2.3517 0.07209 


-1.4145 \ 
0.7079 I 
-0,01824 / ■ 
1.0409 / 


Assume that 024 is increased by 0.4, so that 


/2.384 

1.238 

0.861 

A = 

1.113 

0.761 

11.119 

0.643 

3.172 

\o 745 

2.137 

1.268 

Then (9), (10), and ( 11 ) become 




Bu = 




= 16.867 hi O’ = 1 , 2 , , n), 


1 - 2.3517 X 0.4 
Bn = 16.857 hi (r « 1, 2, • •. , n), 

B,, = hrj — 0.4 Bnhj (v ~ 1,2, ••• , S — l,^f + 1 , , 7 »| 

j = 1, 2, •.. , 12 - 1, 72 + 1, 

Utilization of these equations gives 


,n). 


B = 


-4.5518 

0.5031 

-0.1919 

4.7218 


42.608 
- 4.9191 
0.2014 

-39.644 


-1.3298 

-0.05805 

0.3598 

1.2153 


-19.155 \ 
2.7560 I 
- 0.1021 r 

17.547 / 


4. Concluding remarks. It is seen from equation ( 1 ) that if Anna = — l/ban , 
that is, if axs is increased by the negative of the reciprocal of the corresponding 
element in the transposed reciprocal matrix, then the denominator in the second 
term on the right-hand side of equation ( 1 ) becomes equal to zero, and B cannot 
be found by the present method. It is left to the reader to verify that under 
these conditions A is in fact singular. 

In the illustrative numerical example, the denominator is only 1 — 2.3517 X 
accounts for the large magnitude of some of the elements 
ot B. If AdM were taken to be 1/2.3517 = 0.4252 instead of 0.4, A would have 
become singular. 

If two or more elements in the matrix a are to be changed, the new inverse can 
be found by successive applications of the method. 



A CI^ABS OP RANDOM VARIABLES 


127 


REFERENCES 

[1] H. Hotellino, "Some new methods in matrix calculation,” Annals of Math Stai., 

Vol. 14 (1943), pp. 1-35, 

[2] P. S. Ewteb, “The solution of simultaneous equations,” Psychometrika, Vol. 6 (1941), 

p. 101. 


A CLASS OF RANDOM VARIABLES WITH DISCRETE DISTRIBUTIONS 

Albert Noack 
Cologne, Germany 

1. General results. A large class of random variables with discrete probability 
distributions can be derived from certain power series. Let 

iO 

/(«) == 23 a* real, | z 1 < r. 

We may have either non-negative coefiBcients o* or we may have (—l)'o, > 0. 
In the first case take 0 < z < r; and in the second case take —r < z < 0. Define 
a random variable with the distribution 

(I) 

The above conditions insure P{€ ® ij >0 for all x‘, besides 

The distribution of f may be called the power series distribution (p.s.d.). 

The mean of such a distribution is 

P(f) = 2 = x] 


Hence it follows that 


( 2 ) £(f) = 

We have for the moments about the origin 


z 


- _L y 

dt /(z) * 


a*z* 


J- 

/« M 


53 a^z*. 


and hence 



128 


ALBERT NOACK 


Thus we have the recurrence relation 


(3) 


The central moments are 


/ dflr . I I 

^tr+l ~ ^ ■ 


Mr = Z (* - niyp{^ = a;} = Z (•'C - 


and hence 


da, 1 v* / >\r X d/li 1 t 

z-r - TT^ 2-1 - Ml) — zr-j- Z7-^ J_j {.X — m) a,z 

dz /(a) X dz J(z) X 

f'(z) 1 v' r ’v 

— z ta; — ixi) o,; 

/(z) /(z) X 

The sum of the first and third terra will be found to be /ir+i , hence 


dn. 


dn[ 


whence we have for the central moments of a p.s.d. the lecurronce relation 


(4) 


Mr+l = Z 


J- ,, 


Putting r = Ij MO = 1, Mr = 0, we get the variance of f 
(5) ,, = .^(1) = 2^ = log/(2) + - / 

By (5) I (4) assumes the form 
(4') 

The characteristic function of f is 


.mr'm' 


dfi, 1 

Mr-t-l = 2 — + r}l 2 Mf^I . 


or 

( 6 ) 


«>it) = E e“*P{| ■= x] = .-1 E axe’'*z*, 
* JW » 

r/\ _ /(«*'«) 


To get a relation connecting the cumulants k„ and the moments Mr about the 
origin, we differentiate both sides of the identity 

|:^;(f0' = logf:^(ffr 


I 

-0 p! 



A CliAHK OF HANDOM VARIABLES 


129 


with ranpRct to (it), icientifyinp; cwfficientB in (tty ’ we get* 
(7) Mr “ 


Differentiation of (7) with respect to z gives 


(7') 




Mr—/ 


dx, 

dz y 


Substitution of (7) and (70 in (3) gives 


r+l / 

j-l V 


Mr+l—J Xj 


E 


r-Of 


dfir- l 

dz 


I ' ' 

+ MlMr-j 


, ! dK, 

«/+ZMr-/-^ 


or by (3) after a little re-arrangement 

( 8 ) 


2. Special cases. 

(a) Choosing f(z) ■■ e\ f has Poisson-distribution 

(la) P{? = «) * 

x\ 

(2) and (5) are the well kno^vn relations B(i) ^ o-’(£) =» z; the recurrence formula 
(4) assumes the form* 

(4a) Mr+l = 2 +, r/ir-i j. 

(b) Taking/(z) = (1 - z)‘*, /I: > 0, 0 < z < 1 we get the so-called negative 
binomial distribution 

(lb) PU = ai) = - z)*, X = 0,1,2,.... 

The mean is 

(2b) m - 

while the recurrence fonnula for the central moments is 

(4b) Mr+l * 8 + (Y “7)1 > 

hence the first three moments of this distribution are 

1 Cf. M. G. Kendall, The Advanced Theory of SlalisHcs, Vol. I, p 87 
j Cf. Craig, Am. Math. Soc. Bull,, Vol 40 (1934), p. 262, 



130 


ALBEBT NOACK 


(5b) 


_ kzil + ?) 

^*3 — /I ^ -"la > 


(1 - 

kail + 43 + 2 ^ + 3 / 1 : 2 ) 

The characteristic function of the distribution is 


(6b) 


<pit) 




Writing 2 = »;/(l + i]),k = h/n, v > 0, h > 0 we get the so-called Polya-Eg- 
genberger distribution for lare contagious events , 


(Ibi) 12 (f 


,i . y 

' ximr^) \1 + V/ 


(1 + vr"^, 0,1,2, 


The first four moments of this distnbution are 

(2bi) m = h 

(5bi) Ms = /i(l + r}) 

Ala = h(l + i7)(l + 27 j) 
m = h(l -f 7))[1 3(1 -f »?)(/a + 27?)]. 

To obtain a recurrence relation for the moments consider 


^ ^ ^ I , \2 

dz ' dr, dz'^ dh dz ^ 


dAlr I h £Mr 

L^n n 


hence we find for this distribution by (4) and (4b) 

(4bi) /ir+i = ( 14 - 7 ) [^’/■^ 

It follows from (4bi), that , 1 , is a polynomial in 7 and h, The characteristic func¬ 
tion of this distribution is 


(6bi) 


<p{i) = [1 + 7(1 - e”)] 




(c) The coefl&cients of the series — log(l ~ z) - positive; the 

associated distribution derived is 


(Ic) P(? = a:) = - 

and has the mean 
(2c) 


2 * 


X log (1 - 2 ) ’ 
E{^) = - 


0^2^1; ir^li2) 


_ _ (1 “ 2 ) log (1 - z)' 

• Cf Zeils f. angew Math und Mech., Vol. 3 (1923), p. 279-289. 



A nr RANDOM variables 


131 


Rerum'ncf fumiula Hi lj;ts for this distribution the form 
(4c) u,+, -- 3 -- r (1 - a) 

Ids (i - 2)“llog (1 - 0)p ’ 

wiiilo ill*' vanaiifr aiui tlu* charaftBristic function of this distribution are 

( 5 c.) a. - cHO 

(1 - 3)’ll0g (1 - Z)]* ’ 

(fc) „«) .. '"B fijLil*) , 

iog (1 - z) 

(d) The (‘(tfOicimita «.f the wrics log (1 + 2 )/(l ~ z) = 2 S(2‘'^')/(22!+1) 

I**! 

arc positive, so v,v cati dorive a random varialde f with the distribution 

9 a>+i 

(Id) /Ml - 'St. ■! 1! .. -i!-- , „ 

(2x + 1 ) log 0 < z < l.x - 1, 2,3, 

1 z 

I hiiH the mean 


(‘id) “ -- 77 - 7+1 ’ 

(1 - z) log j~ 

the rocurretiec! fornmla (4) assumeH the form 

M (I-2, \ 

“ 7* (1 - .’/[log i+j '"'7' 

while the variance and the characteristic function of f are 

(1 + z^)1oeJ-±-^- 2z 


+1 s 7'' + 2r 
(12 


('■'*({) 2z 


1-0 


(1 - z) 


if, 1 + 0? ’ 

L'”® nrJ 


(8d) ^(0 ^ L»A£+^>r.>»j <1^, 

log (1 + z) - log (1 ~ 0 ) 

(e) Likewise the cacifficionts of the series 

. ^ 1 3-5---(2x - 1) 0®*+‘ 

sin 0 « 0 + 2 . 4 . 0 - • •(2a:) 2x + 1 

are positive, the derived variable | Avith the distribution 

■P(l ~ 1| = (sm"D~7 



132 


ALBERT NOACK 


(le) 


P{^ = 21+1) 


has the mean 


l-3--‘(2a: - 1) 
2-4-6---(2x) 


2x +"l 


(sin ’ s)‘ 


0 < 2 < 1, X - I, 2, 3, 


) 


(2e) 


Ei^) = 


_2_ 

■\/r~— ~z^ sin”' 2 ' 


The recurrence formula for the moments 


(4e) 


Mr+J — 


dllr , sin”' 2 — zV 1-2- 
J- + ^ -7=-»7T-rT+r 

dz Vl - 2 * (am z) 


gives the variance 
(5e) 




ain ' 2 — z\/1 — z- 
\/l — 2 **(sin~* zY 


The characteristic function assumes the form 


(6e) 


<p(t) = 


sin ' e'‘ z 
sin"^ z 


(f) It is well knovm, that series (b), (c), (d), and (e) are special of the 
hypergeometric function F{a, b, c; z). This function gives a p.a.d., if ahc > 0. 
If a > 0, 6 >. 0, c > 0 or if 0 < 0, i» < 0, c > 0, a, integers, thcrf* exist no 
further restrictions on these parameters. Suppose a < 0, h < 0, c > 0, a integer, 
h not, we must have [6] < a'; if neither a nor b are integers, we imist have 
[a] = [b]. Suppose a < 0, b > 0, c < 0. If c is an integer, a must be an integer 
> c. If 0 IS an integer, but c not, we must have [c] < a. Finally if neillier a 
nor c are integers, we must have [a] = (c). Corresponding conditions are valid, 
if a > 0 , b < 0, c < 0. Regarding 

^ F{a, b; c; 2 ) = -- F{a + 1, b + 1; c + 1; 2 ), 


the mean of a random variable $ with hypergeometric distribution is 

(2f) E{^) = 2 — F(a + 1, b + 1; c + 1; z) 

c F(a, b; c; 2 ) 

Considering the differential equation 

z(l - 2 )f'( 2 ) + [c - (a + b + l)z]f(z) - ahjiz) = 0, 
(6) gives the variance of { 


(f) = h + [1 - c + (a +-hh] iiJLiJ'i 

C I — Z \ Fin. h: A! 


(5f) 


F(a, b; c; 2) 

F(a + 1, b + 1; c + 1; 2 ) "*^ 
F(a,b',Ciz) 

The higher moments of this distribution can now derived from (40. 

* [b] means as usual the greatest integer <6. 


- 2(1 - 2 ) - 
c 



GKOMETHIC RANGE 


133 


THE GEOMETRIC RANGE FOR DISTRIBUTIONS OF CAUCHY’S TYPE 
By K. J. Gumbel and R. D. Keeney 
New York Cily and Metropolitan Life Insurance Company 

1. Intro duction. We consider l&rge samples dra^vn from a symmetrical un¬ 
limited population wIioKO distribution is of the Cauchy type, defined by the 
])ropertif*H 

(1) lim x**!! - /!’(x)] = A, lira (-x)V(x) = A, 

I —to *-♦—M 

where k and A are positive and P\x) stands for the probability function. This 
type of distribution has no momenta of an order equal to or greater than k. 
We constrac't the distribution of a certain function of the extreme values, and 
require only the knowledge of the type of the initial distribution, not of the 
distribution itsidf. 

From each sample we pick out the largest and smallest observations, and 
Xi . If the median of the initial distribution is zero, and the sample size is large 
enough, the probability of any extreme x» or ~Xi being negative can beneglected. 
If we draw N such samples, each of large size n, we obtain N pairs of extremes, 
x„,, and Xi., (r = 1, 2, 3, ■ > • , N), For each sample we can then compute the 
geometric mean, p, of these extremes: 

(2) p = Vairtf— xi), 

which wo henceforth call the geometric range. 

The distribution of these geometric ranges can be obtained directly from the 
joint asymptotic distribution of the extremes. However, it is easier to obtain 
this distribution indirectly from the distribution of the reciprocal of the geometric 
range. This distribution of the reciprocal is of interest in itself: since it possesses 
all moments we can use it to estimate the parameters by the method of moments, 
whereas this problem seems to be very intricate if we start from the distribution 
of the geometric range itself. 

2. The distribution of the reciprocal of the geometric range. The distribu¬ 
tion of the reciprocal of the geometric range follows from a theorem of Elfving 
(1) which may be stated thus: 

"Let X be a symmetrical unlimited variate with probability F(x). Let ^ be 
defined by 

(3) 5 =» 2n VF(xj)[ 1 - F(xn)]. 

Then the asymptotic density function p(t) and the asymptotic probability G(f) 
of £ are: 

(4) gm = (?(£) = 1 - 

where ifo and Ki are the modified Bessel functions of the second kind and of 
order zero and one." 



134 


E. J. GUMBEL AND B D. KEENB3Y 


Introducing instead of A the parameter u defined by Fin) ~ I ~ l/n we 
have, from (1), approximately for large n 

(6) Fixi) = l/n ^ "" ~ ^ ^ 

For the variable £ in Elfving’s theorem, we obtain asymptotically 

(6) ?r/2 = wV'". 

We attach a subscript A; to f to show its dependence on /c. The moments of ft are 
obtained from a formula given by Watson ([3], p. 388) as 

(7) iJ = 2Y(1 + 1/2) 

and all momenta of this variate exist. 

3. Estimate of parameters. From W sets, each of n observations, we pick out 
the largest and the smallest, X,,., and Xi,,. We subtract from each observed 
extreme the central value, m, oftheiV « observations. If each a;„,, = m >0 
and xui' - Xj,,-m < 0 the sample size is large enough. 

Define t) = 1/p The first two moments of r\ are, from (7), 

(8) n = - r’(i + i/2k), = ir'(i + lA). 

u 

Elimination of the parameter u from these two equations leads to 

if __ r^(l + l/k) 

t r<(l + l/2fc)‘ 

In terms of the coefficient of variation, V, this equation becomes 

(9) = r(i + iA)/r*(i + l/2k). 

Substitutmg the value of V computed from the observations, we obtain an es¬ 
timate of k, and hence can obtain an estimate of u from (8). This procedure is 
facilitated by Table 1. 

4. The distribution of the geometric range. Prom a practical standpoint 
the geometric range itself is preferable to its reciprocal since it is easier to interpret 
and easier to calculate from the observed extremes. We vvant to establish its 
distribution gi{p). From the relation (6) of p to ft and the knowledge of the dis¬ 
tribution (4) of we find 

(10) (7i(p) = 1 - G(f*) = 2 m*p-*E:.(2mV*) 
and 




GICOMKTKIC BANGE 


135 


Sirii’fi »f thf'st! Ikiasol funclion.s are available [2], the various probabilities 
and detisitica may be evaluated. 

'Fho simplest way to (iuinpare geometric ranges to the theory is the use of a 
Ijrobability paper (Figure 1). For its construction, consider the linear relation 

(12) log p “ log u + (log 2)/k - (log fO/fc 

olitaiiied from ((>). (iuiiKi'tiuently ivc plot —logf* on the abcissa and write the 
comfsponding values Cri(p), formula (10), on a horizontal axis. An upper parallel 
to the aljsoi.ssa 8ho\v.s the return periods. The obsei-ved geometric ranges are 
plotted on the ordinale in a logarithmic scale. If the theory holds, the observed 
geometric range.s should be scattered about the straight line (12). 


TABLE 1 


Tho ordtr k aiul the varialion V of the reciprocal of the geometric range 


Rccinmcal Older i 

Coefficient of 
variation 

Reciprocal Older 

Coefficient of 
variation 

Wk \ 

V 

1 /fc 

V 

10 j 

088 

.70 

556 

.12 1 

.104 

.80 

632 

.1(5 j 

i . 138 

.90 

.709 

.20 

.171 

98 

772 

.30 

.251 

1 00 

788 

.•10 

.332 

2 00 

1.73 

.50 

.404 

4 00 

5.92 

.00 

.480 

G.OO 

20,0 


If less aecurate estimates of u and k than those obtainable by the systematic 
methods (8) and (U), or the probability paper, will suffice, quick estimates can be 
obtained fnirn the quantiles of the sample of geometric ranges. To the value 
p = u corresponds, according to (G), ft = 2 whence, from the tables [2], (?i(u) = 
2 A;i( 2) - .27973. From N observed geometric ranges arranged in increasing 
magnitude we thins may pick out the mth, pm , with the rank m = .28 N and 
u.Hc it as an estimate u Pm ■ For the medians and p we get Jj, = 1.257 from 
the tables, and thus, by (0), p‘ = 1.591 This formula provides a quick estimate 
of k. We pick out the median p of the N observed geometric ranges. Since we 
liavc an estimate of u, wc olitain an estimate of k from 


(13) 


1 __ lo g p - log u 
Ic " log 1.591 


4.960 log [p/pml ■ 


6 . Analogy between the geometric range and the range. A study of the various 
characteristics of the geometric range for distributions of Cauchy’s type reveals 
structural similarities to the range for distributions of the exponential type. 



OS Qir 02 OI S ♦ ^ ® 01 02 OS 001 002 


136 


E. J. GUMBEL AND R. D. KEENEY 



.8 8yM8El, IJO KEENErJJZ 















REMARK ON KINCAID’S NOTE 


137 


This is not altogether surprising, since (as shown in Table 2) after the appropri¬ 
ate transformations the probabilities of both are identical functions of the respec¬ 
tive transformed variates. 

Of course the two .systems are mutually exclusive; if the observed ranges can 
he reproduced bj'' the first .4y.stem we conclude that all moments in the initial 
distribution exist. If on the other hand, the observed geometric ranges can be 
repre.sented by the second sytem we conclude that no moments of an order 
greater than k exist. 


TABLE 2 

RANGES AND GEOMETRIC RANGES 


Type of Initial 
DistcibutiQU 

ExpoaendU 

Gaucliy 

Variate 

Definition 

Range 

w “ ain -b (- ard 

Geometric Range 

P = Vii. (- * 1 ) 

Transforma¬ 

tion 

2 «» 2 exp ~ (in — — 2u) J 


Logarithm 

Ig 2 » Ig 2 - ^ (x„ - ari - 2 m) 

^ 1 

k 

Ig t*. = Ig 2 - - (Ig Xn 

1 

1 

1 

-t-lg(- Xi) - 2lg«) 

Probability 

0(.U!) » 2 Ki (z) 

Gi (p) = fi Ki (ti) 

Distribution 

g{w) ^‘—Ko iz) 

4fc /f 

Median 

to «« 2w -f .9286/a 

2lg ^ = 21gu -b .9280/1: 

Moan 

w ■= 2u -)- 2y/a 

lg^= -lgM-b21gr(l-|-Hfc) 


REFERENCES 

[1] G. Elfvjnq, “The asymptotical distribution of range in samples from a normal popu¬ 

lation,” Biovielrika, Vol. 35 (1947). 

[2] Tables of the Bessel-functions, Vol 6, British Association for the Advancement of Sci¬ 

ence, Cambridge, 1937. 

[8J G. N. Watson, Theory of Bessel~functions, Cambridge University Press, 1944, 


REMARK ON W. M. KINCAID'S “NOTE ON THE ERROR IN 
INTERPOLATION OF A FUNCTION OF TWO 
INDEPENDENT VARIABLES” 

Bv T. N. E. Greville 
Federal Security Agency 

In a review of Dr. W. M. Kincaid's “Note on the Error in Interpolation of a 
Function of Two Independent Variables,” {Annals of Math. Slat., Vol. 19 (1948), 






138 


P. ERDOS 


pp. 85-88) which appeared m Mathematical Reviews, Vol. 9 (1918), p. 470, I 
stated that "a more simple and elegant, and equally general, expreasion i.s ob¬ 
tainable by a simple adaptation of formula (41), p. 215, of J. F. Steffen.swi’n 
book, Interpolalion." 

This statement is not entirely correct and is also misleading in its implications 
since Dr. Kincaid’s expressions are actually more general in certain respects, and 
simplicity and generality are not the only considerations nor, in this case, the 
most unportant ones. In setting up an expression for the remainder in an inter¬ 
polation formula, the primary objective is to secure an efficient appraisal of the 
remainder. In this respect, Dr. Kmcaid’a expressions are superior as they invoh-e 
only the higher derivatives of the function it is desired to represent, whereas 
StefPensen’s method would always involve a first derivative term in such a way 
as to prevent any refinement of estimates of the error by introducing additional 
given values. 


REMARK ON MY PAPER “ON A THEOREM OF HSU AND ROBBINS” 

By P. Ennds 
Syracuse University 

Professor Robbins kindly pointed out that in my paper mentioned in the title 

(Annals of Math. Stat, Vol. 20 (1949), p. 286-291) I have misquoted a Htatemont 

m the paper of Hsu and Robbins (“Complete Convergence and the Law of 

Large Numbers’’Proc. Vat. Acad, of Sa., Vol. 33 (1947), p. 25-31). I attribute 

«0 

to Hsu and Robbms the conjecture (notations of my paper) that if X/ilfn < « 

then (1) and (2) hold, and proceed to give a counter example. However, the 
conjecture of Hsu and Robbins is not the above false one but the following: If 

< 00 and (1) holds then (2) also holds. This conjecture is true and is in 
fact proved in my paper. 

Professor Robbms also points out that a slight modification of my theorem 
can be stated in a more concise form as follows: Let Xi , Zj , • ■ - be a sequence of 
independent random variableshaving the same distribution function F (a;), and let 

Fn = (1/n) (Zi -f- • • • Zn) 

Then the necessary and sufficient condition that 

'LPr{\Y„\>e] < 

n-l ' 


so 

a: dF(x) = 0, / dF(x) < ■» 


is that 


for every e > 0, 



ABSTRACTS 


139 


ABSTRACTS OF PAPERS 

{Abstracts of papers presented at the New York meeting of the Institute, 

December S7-SO, 1949) 

1. The Asymptotic Distribution of the Extremal Quotient- E. J. Gumbel, New 
York, AND R. D. Keeney, Metropolitan Life Insurance Company, New York. 

The extremal quotient is the ratio of the largest to the absolute value of the smallest 
observation. Its analytical properties for symmetrical, continuous and unlimited distribu¬ 
tions are obtained from a study of the auto-quotient defined as the ratio of two non-nega¬ 
tive variates with identical distributions. The relation of the two statistics is established 
by proving that, for sufficiently large samples from an initial distribution with median 
zero, the largest (or smallest) value may be assumed to be positive (or negative) and that 
the extremes are independent. The logarithm of the extremal quotient has asymptotically 
a symmetrical dietribution. Its median ia unity. As many moments exist for the extremal 
quotient as momenta and reciprocal moments exist simultaneously for the initial variate. 
For the exponential typo of initial distributions, the asymptotic distribution of the ex¬ 
tremal quotient can only be expressed by a complicated integral which may be approxi¬ 
mated in the interval J < q < 2 by the logarithmically transformed normal probability 
function. In this case, no moments exist. For the Cauchy type, the asymptotic distribution 
of the extremal quotient is very simple. The logarithm of the extremal quotient has the 
same (logistic) distribution as the midrange for initial distributions of exponential type. 
For both initial types, tho asymptotic distributions of the extremal quotients possess one 
parameter which may be estimated from the observations 

2. A Second Formula for Partial Sums of Hypergeometric Series having the 
Unit as Fourth Argument. Hermann von Schblling, Naval Medical Re¬ 
search Laboratory, U. S. Submarine Base, New London, Conn. 

If the arguments a and p are changed after the summation, published Ann. Math. Slat. 
Vol. £0, (1049) p. 120, and this method is applied a second time, a new formula results for 
partial sums of F(a,(9,'r;l). A simple recurrence formula ia developed for these partial 
sums. The new equation is a numerical short cut as it is demonstrated with on example. 

3. A Coverage Distribution. Herbert Solomon, Office of Naval Research, 
Washington, D. C. 

Consider a fixed target circle of radius Tk and center at a distance B from an aiming 
point. Let N circles each of radius Wr be dropped at the aiming point with their centers 
subject to a bivariate normal distribution with circular symmetry, the common standard 
deviation denoted by <r. Define y as the set theoretical sum of tho N random circles with 
the fixed circle and let c bo the ratio of y to the total area of the fixed circle. Then it is 
desired to find Pc„ where 

Pc, - FIc ^ Co I 3 'r, Wr, R, N\ 

whore Tr , Wr , and B are in a units. Define 1?^ = IFr -f oTr where a = a(c, Wb, Tr)-, 
|o| ^ 1. It is shown that for iV =» 1, the family of curves in the RR* plane 
defined by Pc, = constant have a slope, m, given by 

URR*) 

” URR*) 

where I* is the modified Bessel Function, of i:'* order. In fact as the product 



140 


ABSTRACTS 


JtR* approaches infinity, ni approaohee unity. From these results, the contours of equal 
probability are easily determined. When W > 1, overlap oonsiderations maha tha compu¬ 
tation of explicit values for Pc„ intractable. However, in this case, upper and lower 
bounds for Pc, can be obtained. 


4. The Problem of the Greater Mean. R. R. Bahadur and Hkruert RomuN.s, 
UniverBity of North Carolina, Chapel Hill. 


"Optimum” solutions (in the seneo of Wald’s theory of statistical deeision functions) 
are obtained for the “problem of tbo greater mean”. Let ir, (i » 1,2) be normal popula¬ 
tions with means m, and common variance <r*, all unknown, and denote the arbitrary hut 
given set of possible parameter points w = (mi , mi iff) by fl. Suppose that a set of ni + 
•Hj independent observations is drawn, n, from w, , and let u ■= (*», • • • , «ini j Jsi , ■ • ■ , 
aiini) denote the sample point Any measurable function/(y) such that 0 ^ /(w) < 1 is called 
a decision function. Given a "risk function” r(f | w) defined for all / and all we il, a deci¬ 
sion function/*(«) is “optimal” if (i) 8up(r(/'* (w)] = inf sup [r(/ | w)], and («) no decision 
function is "uniformly better” than/*(e). If/*(B) is the unique (up to sots of measure 0) 
decision function with property (i), it is "optinvuin". Case 1. Given any decision function 
/(y) and any a (Cl, let 


Let 


r(J \ a) « max [wi, m] — nuJB 1/j w )— mijEjl — /|wl. 


r(v) 


1 if > 1, 
0 otherwiee 



It is shown that under certain conditions on (I, /"(s) is optimum. Case 2. Given any diTisicm 
function which takes on only the values 0 and 1, corresponding to the two decisions "nu < 
ma” and "wj < mi" respectively, and any ««0, let 


r(/l w) = F(incorrect decision 1 ",/)• 

It IS shown that under certain conditions onO, /"(s) is optimal. The oondiUons on IJ arc 
very similar m the two cases, and are likely to be satisfied in most applications. However, 
it is shown by examples that there exist non-degenerate types of Cl with respoot to which 
decision functions other than /°(y) are umformly beller than /°(w). The mothods 
of the paper can be applied to a number of similar problems. 

5. Some Extensions of Bayes’ Theorem. F. C. Leone, Case Institute of Tccli- 
nology, Cleveland 6, Ohio. 


There is some past or a priori knowledge about the quality of a population of lots and a 
sample is taken from a random lot. What can be said about the lot from which this flarnplo 
is taken? We are incorporating the results of our experiment or sample with the previous 
knowledge to form a judgment. From the a priorx distribution and a sample of n with c 
defectives, say two m twenty-five, we form an a posteriori distribution of all two i n twonty- 

fJLTnlk K answer questions such as; “What is the n pos- 

fermn probability that a lot producing a two in twenty-flve result should have a proper- 

w ributions as the rectangular, triangular, normal. Pearson’s Type III and Type I. 

hid ^ ooBsidering lot quality on one 

TT ak n ,1^ -+1. mspeetion, the a prtort distributions of these data are mostly 

£ a^ffood fiTf T® In some eases a Pearson Type I proves to 

be a good fit for the a pnon distribution. woo w 



ABSTRACTS 


141 


6. On Optimum Selections from Multinormal Populations. Z W. Birnbaum 
AND D. G. Chapman, University of Washington, Seattle 

Let {X, Yi, ••• , y„) have an (n + I)-dimensional non-singular normal probability 
density/(X, Xi, , 7„). By “selection" in {Yi, ■ ■ , Y„) we shall understand a meas¬ 

urable function ifl(Yi , ■ • • , Yn) such that 0 < ^ < 1 for all Ki , ■ ■ • , Xn . By a “trunca¬ 
tion in (Fi, • • • , 7n) to the set n" we understand a selection ^>(71 , ■ • • , 7„) such that 

= 1 for (7i , ■ , 7„) infl, and ^ = 0 in fi. A “linear truncation" will be a truncation 

n 

to a set defined by a oondition of the form 52 Ci7, ^ k Using a slight generalization of 

C*=*l 

Neyman-Pearson’s fundamental lemma, the following theorems are proven among selec¬ 
tions for which the expectation of X, after selection, assumes a fixed value, the one which 
maximizes the “retained" portion of the universe /■•■/ (p(Yi , ••• , 7„) f(X, Yi , 
Y„)dXdYi • ■ dYn is a linear truncation. Among all the selections for which a given quan¬ 
tile of X, after selection, assumes a fixed value, the one which maximizes the retained 
portion of the universe is a linear truncation. (Research under the sponsorship of the Office 
of Naval Research), 

7. Simple Regression Analysis with Autocorrelated Disturbances. Howard L. 
Jones, Illinois Bell Telephone Company, Chicago. 

When the disturbances in a regression equation are connected by a linear difference 
equation, the parameters of both equations can be estimated simultaneously by maxi¬ 
mizing a function that describes the joint probability of the disturbances or a linear func¬ 
tion thereof This note discusses a simple example. 

8. A Test of Klein’s Model III for Changes of Structure. A. W. Marshall, 
The Rand Corporation, Santa Monica, Calif, 

This paper suggests a test of equations from linear etochastic equation syetems on the 
basis of observations not included in the original computation period. Rejection regions 
of approximately the right size (asymptotically correct) are constructed and the use of 
naive economic models as an auxiliary test are suggested The procedure is applied to 
Klein’s Model III, the results are tabulated and discussed. 

9. An Application of the Theory of Extreme Values to Economic Problems. 
S. B. Littaueh, Columbia University, and E. J. Gumbel, New York. 

Most studies of economic time series have been concerned with establishing regularities 
of behavior, often by analogy with mechanical systems Much as regularity in economic 
phenomena is desirable, such evidence as has boon available leaves the reality of 
this sought for regularity considerably in doubt It seems more fruitful rather to ask the 
question, “What is the pattern of the non-regularity" and it reasonably answered, to offer 
some verifiable form of explanation therefor. It seems further desirable that any attempt 
at "scientific" explanation of economic phenomena be fortified by evidence of statistical 
stability supported by criteria such as were established by Shewhart for the control of 
quality of manufactured product In the present instance certain concepts of experimental 
inference, which seem natural therefor, are employed in order to give some general and 
plausible unity to the behavior of economic time series. 

Following upon the postulates of the theory presented here, the appropriate formal 
development employs concepts of statistical quality control and of the statistical theory 
of extreme values Within this theory the importance of the absence of statistical stability 



142 


ABSTaA^CTS 


ig emphasized, and the relevance of the uae of concepts in extreme values is made evident. 
By introducing a auperunivorsc, peaks and troughs are random cxpreRsioiiB of a super 
ohance'“cause” system The use of these statistical concepts is not motivated by mere 
analogy but rather as the natural means for explanation of the phenomena studied. 

A number of examples of the application, of these statistical methods to tteleotad sorios 
are offered as evidence of the workability of the theory here presentctl. Thu oxtremea of 
the Dow-Jones index of selected industrials show that the 1928 value was completely out¬ 
side the previous levels and should not have been considered as a "stable high plateau 
basic for perpetual prosperity’’. Instead this should have suggested the imminent break¬ 
down, The validity of the application of the theory of extreme values to these phenomena 
is not so strongly substantiated as are the many applications that have been made of them, 
to flood frequencies, wind velocities, extreme temperatures, breaking strengths and other 
natural phenomena. Nevertheless the results here obtained are highly suggestive of a 
tenable economic hypothesis. 

10. Bks Due to the Omission of Independent Variables in Ordinary Multiple 
Regression Analysis. (Preliminary Report). T. A. BANcnorr, Iowa i^tate 
College, Ames. 


Given n observations of the dependent variate y and the independent variates xi, 
xt, a., fc < T, all variates measured from their respective sample means, 

and we have calculated the ordinary regression of y on the first k variates and y on all r 
variates. We^ define ordinary multiple regression as the single-equation approach, error 
only in y which is assumed normally and independently distributed with sero mean and 
variance (r", the x, being fixed from sample to aaraplo. 

In order to determine whether to omit or retain the lost (r -- k) independent variates 
we formulate a rule of procedure' calculate Snedecor’s F «•> 


Reduction in Sy* due to (r -- k) vgriatos/Cr - k) 

Error mean square after fitting all r variates 

If F is non-significant at some assigned significance level «, we pool the Bums of aquarea 
and degrees of freedom, involved in the numerator and denominator of F, to obtain an 
estimate of the error <F , and fit y on the first fc variates only. If f is signifioant at the 
assigned signifioance level we use the denominator only in F for our estimate ot o-’ and 
hence fit y on all r variates. 

The object of this investigation is to determine the bios in our eatimate e* ot <r* if wo 
follow such a rule of procedure. The bias turns out to be ' 


2a^X 


ni -f nt 


" + 




where 


Xt 


nt 


-f~ Hi a* 


5 W’ 

f-Ai+l 
2v« ’ 


m and are the respective degrees of freedom for the numerator and denominator of E. 

^ function of the population regression coefficients /9i+, , • • • , /S,, The bias 
is discussed for selected values of the parameters involved. 



ABSTEACTS 


143 


11. Estimating Parameters of Pearson Type III Populations From Truncated 
Samples. A. C. Cohen, Je., The University of Georgia, Athens. 

The method of moments is employed with ’single’ truncated random samples (1) to es¬ 
timate the mean, n, and the standard deviation, a, of a Pearson Type III population 
when as is known and (2) to estimate /i, <r, and as when only the form of the distribution 
is known in advance. No information is assumed to be available about the number of 
variates in the omitted portion of the sample. The results obtained can be readily ap¬ 
plied to practical problems with the aid of “Salvosa’s Tables of Pearson’s Type III 
Function.” An illustrative example is included in the paper. 

12. The Cyclical Normal Distribution, E. J. Gumbbl, New York. 

The usual normal distribution becomes invalid for variates, like an angle, lying on 
the circumference of a circle The distribution of such variates was established by 
R. von Mises by the same methods as used for the classical derivation. The cyclical normal 
distribution is symmetrical about a mode and antlmode. The probability function is pro¬ 
portional to an incomplete Bessel function of the first kind and of order zero for an imag¬ 
inary argument, and contains two parameters, the direction of the resultant vector and a 
parameter k linked to the absolute amount of the vector. The parameters may be estimated 
by the method of maximum likelihood. For k = 0, the distribution degenerates into a uni¬ 
form oyolieal distribution. If fc is of the order 3, the distribution approaches the linear 
normal one, k being the reciprocal of the variance. With increasing values of k, the dis¬ 
tribution looses its cyclical character and becomes concentrated in a narrow strip This 
distribution holds for Bymmetrioal unimodal values varying according to pure chance 
about a unique mode in a closed space (as the angles of the wind directions) or a closed 
time, and gives a theoretical model for the variations of temperatures, pressures, rain¬ 
falls, storms, discharges, floods, death- and birth rates over the year, and earth quakes 
over the day. The comparison between theory and observations in plotting the square 
roots of the frequency on polar coordinate paper provides a statistical criterion for the 
regularity of cyclical phenomena (Work done in part under contract W 44/109/QAf/2202 
with the Research and Development Branch, Office of the Quartermaster General) 

13. Treatment of Attenuation Problems by Random Sampling. H. Kahn and 
T. Habbis, The Rand Corporation, Santa Monica, Calif. 

Exact analytical calculations of the transmission of energy by particles through shields 
are difficult; to avoid them random sampling methods may be resorted to. The straight¬ 
forward procedure of simulating life histories of particles, using random number tables, 
mny be used for thm shields, but in the case of thick shields with tremendous attenuations, 
tremendous numbers of particles would bo required. In order to obtain reasonably small 
standard errors, using reasonable numbers of simulated life histories, it is necessary to 
modify the original problem to one having a lower attenuation factor, the solution bearing 
a known relation to the solution of the original problem. Alternatively, this may often 
be regarded as an application of well known statistical sampling procedures, such as repre¬ 
sentative sampling or importance sampling. Various special procedures can be devised. 
One of the first was the splitting technique due to J. v. Neumann. Among others may be 
mentioned the exponential transformation, a simple analytic transformation of the origi¬ 
nal problem into one having a much lower attenuation factor. 

14. On the Existence of Nearly Locally Best Unbiased Estimates. Hbhman 
Rubin, Stanford University, Stanford, Calif, 

For any family y of distributions, and any distribution of y there exists a bilinear 
function K whose arguments are all parameters defined for all distributions of JF and for 



144 


ABSTBACra 


which there exist unbiased estimates which have finite variance if the true distribu- 
Sn, and which has the following properties; (1) If fl is any parameter m the domain o 
K and I is any unbiased estimate of fl, then var[t 1 Po) ^ K(fl, fl). (2) This rosiilt iB heat 
pie, i. e„ for any 0 there is an unbiased estimate t of 0 whose variance differs from 
KCfl, 0) by leas than any preassigned amount 

15. The Experimental Evaluation of Multiple Defllnite Integrals. GtORCjK W. 
Tatlor, U. S. Army Electronics Laboratory, San Diego, Calif. 


When one is forming an estimate of the total, or moan value, of BOnie quantity, nam- 
nline at oarefully selected paints will frequently be preferable to employing a method 
which involves randomisation. The estimation of the total volume of water m a given 
lake or the amount of energy being released in a given time and space, are examples of 
problems where specified points for sampling should result m a reduction in tho error of 
estimate. These and similar problems lead naturally to numerical integration nmthods. 
In the case of single integrals, Gauss’ and Tchcbyclief’a formulae yield maxtmuni elFioiency 
with respect to oontrolling the polynomint error and statistical error rcapccltvely, but 
often the Newton-Cotes formulae can be applied more conveniently. 

For the evaluation of double integrals, an eight point end a thirteen point formula for 
fifth degree accuracy and a twelve point and a tweuty-ono point formula for Hoventh de¬ 
gree accuracy have been developed for integrating over a roctanglo and eiinilar formulae 
have been developed for integrating over areas bounded by a parabola anil a straight 
line or by two parabolas. The following aystem of equations is employed in developing 
these formulae; 


m 

L 


“h i,j for which ^ + j < 2n, 


and where Cy « 


a»V 

(% + 1 ) 0 ‘ + 1 ) 


for both % and j even, 


= 0 otherwise. 


Formulae for the numerical evaluation of triple integrals taken over a roo- 
tangular parallelopiped are developed, including a twenty-one point formula with fifth 
degree accuracy. It is shown that comparable formulae can be developed for integrating 
functions of more than three variables and a 2» -1- 1 point formula with third degree ac¬ 
curacy for integrating a function of n variables over a rectangular n-space is obtained. 

16, Tests of Fit of a Cumulative Distribution Function over Partial Range of 
Sample Data. Bradfohd F. KiMBALii, New York State Dept, of Public 
Service, New York. 

Case 1 Sample data are completely ordered over range tested. 

Let the n -f 1 true frequency differences associated with an ordered random samplo of 
n values of x be denoted by Uy . The cdf of a theoretical test function baaed on n of the 
above frequency differences is identified and methods of approximating it are discussed. 

Case S. Sample data in k ordered groups over range tested. 

Let A, F denote the true frequency differences over tho k sample intervals to be covered 
by the test. Let mi denote the number of unit frequency differences «,• covered by the itk 
interval. Define M and W by 

Af -f 1 = Xm,, M g n, 

k 

W = SA.F, F g 1. 
k 



ABSTRACTS 


145 


A theoretical function Z is defined by 

^ ^ [M + 1)(M + 2) ^ lA.F - w.iy/(ilf + 1)]^ 
fc — 1 k Vlx 

Set 

y = z/W'. 

The cd/ of Y IS identified and methods of approximation to it are discussed 

Applications to testing agreement of sample with hypothetical cdj of universe are con¬ 
sidered for both cases in some detail. 

17. Large Sample Tests for Comparing Percentage Points of Two Arbitrary 
Continuous Populations. A. W. Marshall and J. E. Walsh, The Rand 
Corporation Santa Monica, Calif. 

Let us consider two continuous populations, the first with density function f{x) and 
100oi% point Ba , the second with density function g(x) and 100j3% point These two 

populations are arbitrary except that/(9„) ^ 0, ^ 0 and both/'(9a), exist and 

are continuous in the vicinity of the specified points This paper presents significance 
tests for 8a — which are based on large samples from these populations The exact signifi¬ 
cance level of a test is not known but its value is bounded within reasonably close limits 
(asymptotically). Efiioicncy properties of these tests (compared to the corresponding 
noncentral i-tests) are investigated for the case in which both populations are normal 
and the ratio of variances is known Results are also derived for simultaneously testing 
8a —<t>6 fl-nd f(,Ba)/g(4>e) These tests have known significance levels (asymptotically). A 
particular application of tests of this type occurs when it is desired to test whether two 
samples came from the same population and agreement of the two populations in a specified 
region is to be emphasized. For this special case, the significance levels of the resulting 
tests are reasonably accurate for moderate as well as large sized samples 

18. On the Distribution of Wald’s Classification Statistic. H. L. Hahteb, 
Michigan State College, East Lansing. 

A study 18 made of the distribution of the classification statistic introduced by Wald. 
The exact distribution of F in the univariate case, as obtained by the use of characteristic 
functions and contour integration, is given for both degenerate and non-degenerate cases. 
The problem of classifying an individual into one or the other of two populations, using 
the statistic V, is discussed. In the multivariate case, examples are given of the distribu¬ 
tion of an approximation to V suggested by Wald The procedure here consists integrating 
out two variables from the joint distribution of three variables to find the distribution of 
the third. Four cases arise, depending upon whether the sampie size and the number of 
variates are even or odd. Since this approximation is valid only for largo samples, an at¬ 
tempt is made to find an approximation which is asymptotically equivalent to it as the 
sample size increases, but which is valid also for small samples. Results are given for a 
sampling experiment performed to determine an empirical distribution of V for a specific 
small sampling case, using a population of 10,000 pieces modeled after Shewhart’s normal 
bowl. Obstacles in the path of practical applications are discussed 

19. Analysis of Extreme Values. W. J. Dixon, University of Oregon, Eugene. 

Consider a population JV(m, <r’) contaminated by introducing a certain proportion of 
values from a population A (m + Xir, a^) or N(ji, AV*) The performance of various statistics 
for discovering these contammators is assessed by sampling methods for samples of size 6 
and 15 (This research was sponsored by the Ofifice of Naval Research) 



146 


ABBTRACTS 


20. A Note Oa The Variance Of Truncated Normal Distributions. A. ('. Coukn, 
Jb., The University of Georgia, Athena. 

Formulas are derived whereby the variance of truncated normal distributions can read- 
ily be computed with the aid of an ordinary table of areas and ordinatea of the normal 
frequency function These reflults are applicable to certain tolerance, prttbleniB involved 
in Statistical Quality Control Their use will enable one to make computations required in 
solving such probloma without resorting to Karl Pearson's relatively inaccessible tables 
of "Values of the Inoomploto Normal Moment Functions". 


21. Some Estimates and Tests Based on the r Smallest Values in a Sample 
(By Title). J. E. WAiiaH, The Hand Corporation, Santa Monica, Calif. 

Let us consider a situation whore only the r smallest values of sample of sise n are avail¬ 
able. This paper investigates the case where n is large and r is of the form pn 4- OCVn). 
Properties of some well known estimates and tests of the lt)flp% jmpulatum point (baaed 
on statistics of the type used for the sign test) arc investigatmi. If the sample is from a 
normal population, these nonparametric results have high efTicicncics for small values of 
p (at least 95% if p < 1/10). The other investigations are restricted to the ease of a nor¬ 
mal population. Asymptotically "boBt" estimates and tests of the imimlation perconlage 
points are derived for the case whero the population variance is known. If the poiiulation 
variance is unknown, asymptotically most eflioLont estimates and tesla can he obtained 
for the smaller population percentage points by suitable choices of p and Ofv^nl. The 
results of the paper have application in the field of life testing. There the r smallest sample 
Values can be obtained without the necessity of obtaining the remaining sample values 
By starting with a larger number of units but stopping the osperiment when only a small 
percentage have “died", it is often possible to obtain the same amount of "information" 
with a substantial saving in cost and time over that required by sterling with a smaller 
number of units but continuing until all have “died”. 


22 


Some Comments on the Efficiency of Significance Tests (By Title*) 
Walsh, The Rand Corporation, Santa Monica, Calif, 


J. E. 


A method sometimes used to measure the efficiency of a significance test consiet# in 
associating a statistic with the test and defining the efficiency of the test to be the effi¬ 
ciency of this statistic considered as an estimate. This paper investigates the power func- 
tion implications of this method of defining the efficiency of a teat. Examples are presented 
which show that an estimate effieiency of 100F:% does not necessarily imply that the corre- 
spondmg most powerful test based on 100B% as many sample values has approximately 
the same power function as the given test (for the admissible set of alternative bypothe- 
ses) In several of the examples it was found that estimate efficiency makes no allowance 
for the effect of sigmflcance level while the relationship between the power functions of 
To f InTfie! “f f powerful tost changes uotieeabty with respect 

IZ'TTn""- “on-asymptotie while otSms 

are asymptotic However, results are obtained for the asymptotic ciuie which indicate that 
his equality O power functions does hold for a rather broad class of sSiJn e esU i 
the pertinent statistics have distributions which are asymptoticaTylrmal 



NEWS AND NOTICES 


147 


Columbia Umvereity. Without the application of a sampbng procedure the problem can only be 
solved either by a complete physical inventory which is very costly, or by a cycle check which takes 
many years to complete. By use of the sequential sampling method, results of desired accuracy are 
obtained quickly and at very low cost since an extremely small percentage of field inspection for the 
mass property accounts of any large utility produces satisfactory conclusions. 


NEWS AND NOTICES 

V ✓ 

Readers are invited to submit to the Secretary of the Institute news items of interest. 

Personal Items 

Dr. Ralph A. Bradley accepted an appointment as Assistant Professor in the 
Mathematics Department of McGill University, Montreal, Canada after re¬ 
ceiving his Ph.D. in mathematical statistics at the University of North Carolina 
in June, 1949. 

Mr. Fred J. Clark, Jr. received his master of science degree in mathematics 
from the University of Illinois in August, 1949 and is now employed by the Uni¬ 
versity of California at the Sandia Laboratory in Albuquerque, New Mexico. 

Professor J. L. Doob is on leave from the University of Illinois to teach at Cor¬ 
nell University for the academic year 1949-1960. 

Mark W. Eudey obtained his Ph.D. degree in statistics at the University of 
California, Berkeley, and is now Vice President of California Municipal Statis¬ 
tics, Inc. 

Dr. Joseph L. Hodges, Jr. has been promoted to Assistant Professor and Re¬ 
search Associate at the Statistical Laboratory, University of California, Berkeley. 

Professor Paul Horst, formerly of the Department of Psychology, University of 
Washington, is now Director of Research at the Educational Testing Service, 
Princeton, New Jersey. 

Dr. Fred C. Leone, formerly an Instructor and a Research Fello-w at Purdue 
University, has been appointed Instructor in the Mathematics Department and 
Director of the Statistical Laboratory at the Case Institute of Technology. 

Mr. Fred W. Lott, who has been studying at the University of Michigan for 
his Ph.D., has accepted an assistant professorship at Iowa State Teachers College, 
Cedar Falls, Iowa. 

Dr. Francis McIntyre has resigned as Director of Export Control, Office of 
International Trade, U. S. Department of Commerce, Washington, D. C. to 
accept a post as Director of Economic Research, California Texas Oil Co., 661 
Fifth Avenue, New York, New York. 

Mr. R. B. Murphy, who has been a graduate student at Princeton University 
has accepted an instructorship in the Mathematics Department of Carnegie In¬ 
stitute of Technology. 

Professor Jerzy Neyman, Director of the Statistical Laboratory, University of 
California at Berkeley, will be on sabbatical leave for the Spring Semester, 1950. 

Mr. Monroe L. Norden, formerly of the Glenn L. Martin Co., is now a Mathe¬ 
matical Statistician with the Operations Research Office, Johns Hopkins Uni¬ 
versity, Ft. Lesley, J. McNair, Washington 25, D. C. 



148 


NEWS AND NOTICES 


Mr. D, Martin Sandelius, formerly a Research Assistant in the Institute of 
Statistics, Uppsala, Sweden, has been appointed Lecturer in the. Mathematics 
Department, University of Washington, Seattle, for the academic year 194Q~ 
1950. 

After completing his graduate work at Ohio State University, Dr. William J. 
Schull accepted a position with the Atomic Bomb Casualty Coinmi.Hsion. He is 
now in Japan as a geneticist working on follow-up studies at Hiroshima. 

Miss Elizabeth L. Scott obtained her Ph.D. degree in statistics at the Uiuver- 
sity of California, Berkeley and was promoted to Lecturer and Research Abho- 
ciate at the Statistical Laboratory. 

Miss Ester Seiden obtained her Ph.D. degree at the University of California, 
Berkeley and was promoted to Lecturer and Research A.ssociatc at the Statistical 
Laboratory. 

Mr. Iriving H. Siegel is on leave from his position as Chief ICconomist at the 
Veterans Administration until June 30, 1960, to serve us Lecturer in Political 
Economy at the Johns Hopkins University and as a member of the Johns Hop¬ 
kins University Operations Research Office staff. 

Dr. Charles M. Stein, Assistant Professor and Re.scarch A.ssociatc at the 
Statistical Laboratory, University of California, Berlceley, will bo on leave for th(‘ 
academic year 1949--1950 and will bo working in Paris as a National Ile.seacch 
Fellow, 


Alfred James Lotka 

Alfred James Lotka, a Fellow of the Institute, died in Rod Bank, New Jorhc>y, 
on December 6,1949 He was born of American parents in Poland, March ‘J, IHHO, 
and had his early schooling in France. His academic training was received at 
Birmingham, England (B.Sc., 1901, and D.Sc., 1912), Cornell (M.A., 1<)()9), and 
Johns Hopkins (1922-1924). Dr. Lotka came to the Statistical Bureau of the 
Metropolitan Life Insurance Company in 1924 and retired as Assistant Statisti¬ 
cian in 1947. His major contributions were his highly original work on the mathe¬ 
matical theory of evolution, on the mathematical analysis of population, and on 
the theory of self-renewing aggregates. Altogether, Dr, Lotka had almost 100 
papers in these fields in technical and scientific journals, both here and abroad. 
Ihe essentials of his work are summarized in hisbooka, “The Elementa of Human 

loiogy and Theorie analytique des associations biologiques.” He was, in addi¬ 
tion, a joint author on several books in the field of public health. 

Dr. Lotka was a past president of the Amorican Statistical Association and of 

Vke PrSnC “ American 

Vice-Piesident of the International Union for the Study of Population, 

Statistical Summer Session in Berkeley, Calif. 

Following the established pattern, there will be held this year a Statistical 
University of California, Berkeley. The faculty will in- 
clude Wilbara G Cochran of Johns Hopkins University, Benjamin Epstein of 
Wayne University, Erich L. Lehmann of the University 5 CalifcrX, pS L^ 



NEWS AND NOTICES 


149 


of the Ecole Polytechnique, Paris, France and Gottfried E. Noether of New York 
University. 

Courses will be offered on both the graduate and the undergraduate levels. The 
graduate courses, all given during the First Summer Session, June 19 to July 29, 
are meant primarily for students who either have already obtained their Ph.D. 
degree or are working toward it. No specific prerequisites to graduate courses will 
be required. The graduate program includes (i) a course on design of experiments 
and a seminar on analysis of variance by W G. Cochran, (ii) a course on theory 
of estunation by E. L. Lehmann, and (iii) a course and a seminar on random vari¬ 
ables and random functions by Paul L6vy. 

Inquiries should be addressed to the Office of the Summer Sessions, lA Ad¬ 
ministration Building, University of California, Berkeley 4, California. 


At a meeting of its Executive Council, AAPOR has laid plans for its 1950 
meetings to be held jointly with the World Association for Public Opinion Re¬ 
search (WAPOR) at Lake Forest College, near Chicago, June 16 to 20. 

The program which is now being planned will be designed to fit the needs of 
the Association’s membership, which is composed of leaders in both the academic 
and commercial fields. 


The Council of the Institute of Mathematical Statistics requested Professor 
Harold Hotelling to communicate to Professor S, S. Wilks its appreciation of 
his editorship of the Annals duiing the years 1938 to 1949. On the recommenda¬ 
tion of the Council Professor Hotelling’s letter is reproduced below. 

January 6, 1950 

Piofessor Samuel S. Wilks 
Fine Hall 

Princeton, New Jersey 
Dear Professor Wilks; 

In behalf of the Council of the Institute of Mathematical Statistics and by its 
direction, I write to express the appreciation we all feel for the splendid efforts 
which you have expended so freely upon the Annals of Mathematical Staiishcs, 
and which have been so conspicuously successful in establishing it as a sound 
and reputable journal. The years of your editorship are memorable ones for the 
history of statistics, and your contribution to making them so is of first im¬ 
portance. 

Very sincerely, 

Harold Hotelling 


New Members 

The fallowing persons have been elected to memhership in the Institute 
(August 23, 1949 to November 30, 1949) 

Anderson, Oskar, Ph.D. (Kiel) Professor, University of Munich, Komgin-Slrasse 69, 
Munich [Munchen), Germany 



150 


NEWS ANB NOTICES 


Puente Arroyo, Felix Jorge, CPA, (Univ, Nal. Litoral) Profeseor titular Mathematics, 
Italia ISSO, Rosario, RepvbUca Argentina. 

Arvanltis, Ernest A., A.B. (Boston Umv.) Student at Columbia Univeraity, 4S~IS 401^ 
Street, Sunnyside, L. I,, New York 

Bhatt, Narbheshanker M., Ph.D. (Edinburgh Univ.) Professor of Statiatica, Commeroe 
College, Behind Eaopura Tower, Baroda, India 

Bose, Raj Chandra, D. Litt (Caloutta Univ.) Professor of Mathomatical Stntistios, Uni¬ 
versity of North Carolina, 110 Noble Street, Chapel Hill, North Carolina. 

Carrelro, Oscar Edtwaldo Porto, Civil Engineer (Univ. of Brasil) Professor da Paouladsde 
de Cienoiaa Bconomicas, Avenida Sao Sebastiao 266, 8ao Paulo, Brasil. 

Crump, Phelps P., B.S. (Iowa State) Graduate Student and Researoh Asaistsnt, Box S4^, 
State College Station, Raleigh, North Carolina 

Davis, Richard L,, B.S. (North Carolina State) Sales Engineer, Box 304, CharUrtte, North 
Carolina 

Dlckman, Sidney, A.B. (Brooklyn College) Graduate Student at Columbia Ifniversity, 
3833 West SBih Street, Brooklyn 34, New York 

Fitzgerald, Rev. John F., S J., M.S, (Univ. of Detroit) Assistant Professor of Physics and 
Mathematics, College of the Holy Croes, Worcester 3, Massachusetts. 

Godsey, Ellis B., B.S. (Indiana Univ.) Analytical Statistician, Army Cliemical Corps, 
Ills Pin Oak Road, Baltimore 4, Maryland 

Ghurye, S. G. M.So. (Univ of Bombay) Student and assistant, Department of Mathe¬ 
matical Statistics, c/o The Institute of Statistics, Phillips Hall, Chapel Hill, North 
Carohna 


Gutt, Paul, M.S. (Univ of Chicago) Ordnance Research iHl, Mathematician, 8431 S. 
Ellis, Chicago, Illinois 

Hannan, Harry H., S.M. (Univ of Chicago) Chief, Statistical Research and Analysis 
Unit, Personnel Research Section, AGO, Dept, of the Army, 4111 Maryland Ave. 
(Brookmont), Washington 18, D. C. 

Henderson, Charles R., Ph.D. (Iowa State) Associate Professor, Animal Husbandry 
Department, Cornell University, Ithaca, New York. 

Harter, Hannan Leon, Ph.D. (Purdue Univ.) Assistant Professor of Mathcmalioa, Michi¬ 
gan State College, East Lansing, Michigan. 

HoffmM, William Charles. M A (Univ. of Calif, at Los Angeles) Graduate Assistant, 
Department of Mathematics, Cornell University, Ithaca, New York. 

Hydeman, William Robert, M.A. (Syracuse Univ.) Mathematician, U. S. Navy Depart¬ 
ment, S810-sm Street, N.W., Washington 18, D. 0. 

Kellerer, Hans, Ph.D Referent, Bayerisches Statistisohos Landesomt, Munchen 8. Roeen- 
heimerstr 130, Germany, 

Kramer, Kenneth H., M S. (Carnegie Inst, of Tech.) Teaching Assistant at Carnegie 
Institute of Technology, 379 Seneca Street, Turtle Creek, Pennsylvania 

Llebeiman, Gerald J., M.A (Columbia Univ.) Engineer and Mathomatioal Btatistiaian, 
^tatotical Engineering Laboratory, National Bureau of Standards. Washington 

®®“ion8trator in Mathomatios. Statistical 
Laboratory, St Andrews Hill, Cambridge, England. 

Ra1“’G PhD ffr”' U.C.O.F.S., Bloemfontein. South Africa. 

of Statistical Department, Sta.te Serum Institute, 

fri ^>^«“tor Genera de Estadistioa, Mineaterio 

versitv Calle Mathematics, FacuHad Cienoias Economics, Central Uni- 

Rlggs, ChSs L P^S m Miranda, Venezuela, 

partment of ^ Kentucky) Assistant Professor of Mathematics. De- 

partment of Mathematics, Kent State University, Kent, Ohio. 



RBPOBT OF NEW TOHK MEBTINa 


151 


Saxer, Walter, Ph,D Professor a.d. Eldg. Techn Hochsohule, Zurich, Qoldbach-Kusnacht, 
Switsierland. 

Scobert, Whitney, M.S. (Univ. of Oregon) Associate Professor of Mathematics, Mathe¬ 
matics Department, Idaho State College, Pocatello, Idaho. 

Settling, Robert E., Ph.D. (TJniv. of Mich.) Senior Scientist, Officer in Charge, Statistical 
Branch, Epidemiology Division, Communicable Disease Center, XJ. S, Public Health 
Service, Atlanta, Georgia 

Steyn, Hendjlk S., Ph.D. (Univ. of Edinburgh) Lecturer in Statistics, University of Pre¬ 
toria, SOB Fourth Private Avenue, VilUeria, Pretoria, South Africa 

Zacharlas, William B., A.M. (Univ of Pennsylvania) Instructor in Mathematics, Temple 
University, 1BS9 — 67th Avenue, Philadelphia S6, Pennsylvania 

Zelgler, R. E,, Ph.D. (Univ. of Iowa) Associate Professor of Mathematics, Mathematics 
Department, Bradley University, Peoria 6, Illinois. 


REPORT OF THE NEW YORK MEETING OF THE INSTITUTE 

The twelfth Annual Meeting of the Institute of Mathematical Statistics was 
held in New York City on December 27-30, 1949. Headquarters were at the 
Biltmore Hotel where most of the sessions were held; one or more of the sessions 
were held at the Hotel Commodore, the McAlpin Hotel, and the Governor Clin¬ 
ton Hotel. The meeting was held in conjunction with the Annual Meeting of the 
American Statistical Association, the American Association for the Advance¬ 
ment of Science, the American Mathematical Society, the Econometric Society, 
the Psychometric Society, the Mathematical Association of America, the Asso¬ 
ciation for Computing Machinery, and the American Psychological Association. 
The following 214 members of the Institute attended: 

F. S. Acton, P. H. Anderson, R L. Anderson, T.W Anderson, H. E. Arnold, K J Arnold 
Max Astraohan, R R. Bahadur, E W, Bailey, T. A. Bancroft, W. D. Baten, E. E. Blanoher, 

C. I. Bliss, R. C. Bose, A, H Bowker,R A. Bradley, Dorothy Brady, A E Brandt, I. D. J. 
BrosB,T H. Brown, O.P. Bruno, P.T.Bruyere,R,W Burgess, J.M. Cameron, B H.Camp, 
E W. Cannon, S. D. Canter, Bernard Carol, O. S. Carpenter, Maria Castellani, Jack Chas- 
ean, Randolph Church, Edmund Churchill, W. G Cochran, A. C. Cohen, Jr , R. H. Cole, 
E. P Coleman, F. G. Cornell, Jerome Cornfield, C. C. Craig, M T, Crapsey, J F Daly, 

D. A Darling, Besse B. Day, F. R. Del Priore, W. E Deraing, Philip Desind, W. J. Dixon, 
C, W. Dunnett, Solomon Dutka, P. S. Dwyer, Benjamin Epstein, W. D Evans, W T Fed- 
erer, William Feller, J W. Fertig, Leon Featinger, C H. Fischer, J, C Flanagan, M. M. 
Flood, L. R, Frankol, N. M. Franklin, H A, Freeman, Bernard Friedman, Melitta L Gar- 
buny, E. F. Gardner, M, A. Goisler, H, H. Gormond, Leon Gilford, Abraham Golub, William 
Gombetg, C. H, Graves, S. W. Greonhouso, J. A Greenwood, Evelyn B. Grossman, H. T. 
Guard, Carl Hammer, B. C Hammond, H. H Harmon, T E Harris, BoydHarahbarger, 
H, L, Harter, W. A. Hendricks, L. H. Herbaoh, J. L. Hodges, Jr., Wassily Hoeflding, Helen 
M. Humes, Harold tiotolling, Cuthbert Hurd, H. M. Hughes, W. R. Hydeman, S M. Ikh- 
tiar-ul-MuIk, S. I Isaacson, Marcus Jacobs, W W. Jacobs, J. E. Jackson, Carol M. Jaeger, 
J, B. Jeming, R. J. Jesaen, H. L. Jones, Alice S. Kaitz, W. C. Kalinowski, Leo Katz, R D. 
Keeney, B. F. Kimball, Leslie Kish, Lila F. Knudaen, Paul Koditachek, C. F, Kossack, K 
H. Kramer, R, R Kueblct, Jr., S. M. Kwerel, R. B. Ladd, Marguerite Lehr, F C. Leone, 
Joseph Lev, Howard Levene, G. J. Lieberman, JuliusLleblein, S. B Littauer, SimonLopata, 
Irving Lorge, E. D Lowry, L. H. Madow, W. G. Madow, BenjaminMalzberg, JosephMan- 
delson, B. S. Marks, Margaret P. Martin, J W.Mauchly, P. J. McCarthy, Margaret Merrell, 



152 


REPORT OF NEW YORK MEETING 


Albert Mindlm, P D. Minton, Robert Mirsky, A. M. Mood, Dorie N. Aloma, It 11, Alorria, 
Dorothy J.Morrow, J.W.Morae.J E Morton, Judith Mobs, R. O Mo»h, Frederick Mostel- 
ler, C M Mottley, Hugo Muench, L. F Nanni, Dona Newman, O. E. Noether, M. L, Nor- 
den J. A. Norton, Jr,, 11. W Norton. E, G Olda, P S. Olmetead, A. L. O'Toole, W. It. 
Pabst, Jr., R. E. Patton, Katherine Pease, G. W. Petrie, B. E. Phillips, E W. Pike, Aditya 
Prakash, Frank Proschan, J. E. Ilnup, L. J. Reed, J. S. Rhodes, P. It. Rider, H. G. Romig, 
Norman Rudy, Marion M. Sandomire, F. E. Satterthwuite, Mary Ann fciavas, M. A. Hchnei- 
derman, Samuel Sohweid, 0 A. Shaw, G, D. SlicHard W. A. Shewhart, S. B. Shrikhande, 
Harry Shulman, I H, Siegel, Rosedith Sitgreavea, G. W. Siiedecor, Herbert Solomon, D. h. 
South, Mortimer Spiegelman, R. G D. Steel, J R. Steen, Arthur Stem, Jo»cph Steinberg, 
F. P. Stephan, A. I. Sternholl, J. S. Stock, J. G. Strioby, J. V. Sturtevant, W, R. Thompaon, 
L. J. Tick, Gerhard Tintner, M M. Torrey, J W. Tukey, G. W. Tyler, S A. Tyler, Uttom 
Chand, D. F Votaw, Jr„HelenM. Walker, W. A. Wallie, Samuel Woma, E. L. Welker, D. R 
Whitney, Frank Wilcoxon, R. I. Wilkinson, S S, Wilks, C. P Winsor, M. A. Woodbury, 
Holbrook Working 


The opening session on Tuesday, December 27, 9 A. M., held joiotly with the 
American Statistical Association and the American Mathematical Society, was 
devoted to Operations Research, with Professor J. Steinhardt, Operationa Evalu¬ 
ation Group, Massachusetts Institute of Technology presiding. The following 
papers were presented: 


1, Topics on Ihe Methodology of Operations Research, B. 0 Koopmun, Coluinina IFiiivur- 
sity. 

2. Some Applications of ihe Mathematical Theory of Games, G, B, Kiinbiill, Culuinbia Uni¬ 
versity. 

3 Theory of Games, L, Gillman, Operations Evaluation Group, Massachusetts Instituti) 
of Technology 

4 Development of Theories of Action, Ellis Johnson, Operations Research Ollico. The J iiluis 
Hopkins University. 

6 Some Industrial Applications of Operations Research, A. A. Brown, Operations Evalua¬ 
tion Group, Massachusetts Institute of Technology. 


At the second session, held jointly with the American Statistical Association, 
at 2:30 P M. on the opening day, Professor M. Loeve, University of California, 
gave a special invited address entitled. Fundamental Limit Theorems in Proh- 
ahility. The discussion was presented by Professor Will Feller of Cornell Uni¬ 
versity and Professor H. E. Robbins of the University of North Carolina. 
Professor Abraham Wald of Columbia University served as chairman. 

The first contributed papers se.ssion was held on the same day at 4:00 Ih M., 
with Professor W, D. Eaten of Michigan State College and Michigan Agricul¬ 
tural Experiment Station as chairman. Tho following papers were priwcnted: 

1. The Asymplolic Distribution of the Extremal Quotient. B. J. Gurabol, New York, and R. 
D Keeney, Metropolitan Life Insurance Company, New York. 

2 A Second Formula for Partial Sums of Hyper-geometric tScriea Having the Unit as Fourth 
Moment Hermann von Sohellmg, Naval Medical ReBearch Laboratory, Now London, 
Connecticut. 

3 A Coverage Distribution, Heibert Solomon, Office of Naval Research, Washington, 
DC 

4 The Problem of the Greater Mean. R. R. Bahadur and Herbert Robbins, University of 
North Carolina 



KEPORT or NEW YORK MEETING 


153 


5. Some Extensions of Bayes' Theorem. F. C. Leone, Case Institute of Technology 

6. On OptimumSelecltons fromMuUinormalPopulations. Z. W. Birnbaura and D, G. Chap¬ 
man, University of Washington. 

On Wednesday morning, December 28, at 10:00 A. M. a session on Cyber¬ 
netics was held jointly with the American Statistical Association and the 
American Mathematical Society The following papers were given: 

1. Technique of Multiple Prediction. Norbert Wiener, Massachusetts Institute of Tech¬ 
nology 

2. Stochastic Problems in Neurophysiology Walter Pitts, Massachusetts Institute of 
Technology, 

3. Information Theory. Claude Shannon, Bell Telephone Laboratories 

with discussion by Professor J. L. Doob, University of Illinois, Professor Mark 
Kac, Cornell University, and Professor L. J. Savage, University of Chicago. 
Professor Jerzy Neyman, University of California was Chairman of the session. 

The session on Review of Statistical Methodology was held jointly with the 
American Statistical Association at 2:00 P. M., Wednesday, December 28, with 
Professor W. A. Wallis, University of Chicago, as chairman. The two papers 
presented were; Review of Statistical Methodology in Agriculture and Related 
Fields, by Professor W. T. Federer, Cornell University and Recent Developments 
in Statistical Methodology in Social Science, by Professor Frederick Mosteller, 
Harvard University; discussion followed by Professor L. J. Savage of the Uni¬ 
versity of Chicago. 

The second session of contributed papers was held jointly with the American 
Statistical Association and the Econometric Society on Thursday, December 
29, at 10:00 A. M., with Professor H. T. Davis of Northwestern University 
presiding. The following papers were presented: 

1. Simple Regression Analysis with Autocorrelated Disturbances. Howard Jones, Illinois 
Bell Telephone Company. 

2. Application of Sequential Sampling Method to Check the Accuracy of a Perpetual Inven¬ 
tory Record Joseph Jeming, New York City. 

3 A Test of Klein's Model III for Changes of Structure Andrew Marshall, Rand Corpora¬ 
tion. 

4. Application of the Theory of Extreme Values to Economic Problems 8. B. Lit- 
tauer, Columbia University and E J. Gumbel, New York City. 

6. Bias Due to the Omission of Independent Variables in Ordinary Multiple Regression 
Analysis. T. A. Bancroft, Iowa State College 

6. Estimating Parameters of Pearson Type III Populations from Truncated Samples A C. 
Cohen, Jr , University of Georgia. 

7. The Circular Normal Distribution. E. J. Gumbel, New York City. 

The third session of contributed papers was held at 2:00 P. M. on Thursday, 
December 29, with Professor L. C. Aroian of Hunter College as Chairman. The 
following papers were presented in person or by title as indicated: 

1 Treatment of Attenuation Problems by Random Sampling. H, Kahn and T. Harris, The 
Rand Corporation 

2 On the Exstence of Nearly Locally Best Unbiased Estimates. Herman Rubin, Stanford 
University. 



154 


BEPOB.T OF NEW YORK MEETING 


3. The Experimental Evaluation of Multiple Definite Inlegrala. George Tyler, Naval Elee- 
tronioa Laboratory, San Diego, Galifornin, 

4. Tests of Fit of a Cumulative Distribution Function Over Partial Range of Sample Data. 
Bradford Kimball, New York State Department of Public Service, New York City. 

6. Large Sample Tests for Comparing Percenlage Points of Two Arbitrarj/ Continuous 
Populations. A W. Marshall and John Walah, The Rand Corporation. 

6. On the Distribution of Wald’s Classification Statistics. Harman L. Harter, Miohigan 
State College. 

7. Analysis of Extreme Values. W. J. Dixon, University of Oregon. 

8. A Role on the Variance of Truncated Normal Distributions. (By title) A. C. Cohen, Jr,, 
University of Georgia. 

9. Some Comments on the Efficiency of Significance Tesla. (By title) John Walsh, The Rftnd 
Corporation. 

10, Some Estimates and Teals Baaed on the Smallest Toi-uea in a Sample. (By title) John 
Walsh, The Rand Corporation. 


The subject of the next aeasion, 4:00 P. M. Thursday, December 29, was the 
Review of Stochastic Processes from the Point of View of Mathematical Statistics. 
This session was held jointly with the American Statistical Association, Pro¬ 
fessor C. C. Craig of the University of Michigan presiding. Two papers ware 
given, one by Professor A. B. Mann of the National Bureau of Standards, Ohio 
State University and the University of California; and the second by Professor 
John Tukey, Princeton University. 

On Friday, December 30, at 9:00 A. M, a session on Statistical Methods in 
Astronomy was held jointly with the American Statistical Association and Section 
D of the American Association for the Advancement of Socience. Professor 
Walter Bartky of the University of Chicago, Chairman of the session, opened the 
meeting with introductory remarks on Astronomical Problems Requiring Sta~ 
tistical Methods. The following papers were presented: 

1. The Nearby Stars. Peter Van Do Kaiiip, Swarthmore College, 

2 Corrections (o Observed Frequency Distributions. Bart J. Bok and J. K. Do Jonge, Har¬ 
vard University. 

3. The Problem of Selective Idenlifiabilily of Binaries. Elizabeth Scott, University of Cali¬ 

fornia. 

4. Multivariate Periodogram Analysis and DelecHon of Variable Stars. Harold Hotelling 

University of North Carolina. 


These papers were discussed by Professor Jerzy Neyman, University of 
California. 

The session on Discriminant Functions in Education was held jointly with the 
^encan Statistical Association, the American Psychological Association and 
the Psychometric Society. Professor T. W. Anderson of Columbia University 
gave an invited address on Classification by Multivariate Measures, followed by 
iscuamon by Professors J. 0. Flanagan of the University of Pittsburgh and 

Jolm Carroll of Harvard University. Professor Robert Thorndike of Columbia 
Umversity presided. 

TT CompuMion and was held 

American Statistical Association and the Association for Cora- 
putmg Machinery Professor Harold Hotelling of the University of North Caro- 
Ima serving as Chairman. The following papers were given: 



MINUTES OF ANNUAL MEMBERSHIP MEETING 


166 


1. Idiosyncrasies of Automatically-sequenced Digital Computing Machines. Ida Rhodes, Na¬ 
tional Bureau of Standards. 

2. Problem Solving on Large-Scale Automatic Calculating Machines. W. D. Woo, Harvard 
University. 

3. A Statistical Application of the UNIVAC. John Mauchly, Eckert-Mauchly Computer 
Corporation. 

These papers were discussed by James McPherson, Bureau of the Census and 
Emil Schell, Office of the Air Comptroller. 

Meetings of the Council were held on Tuesday, December 27, at 12:00 Noon, 
Professor Jerzy Neyman presiding and again on Thursday, December 29, at 
12:00 Noon, Professor J. L. Doob presiding. The Business Meeting was held on 
Wednesday, December 28, Professor Jerzy Ne 5 rman presiding. The report of 
this meeting is given elsewhere in this issue. 

S. B. LlTTAUEB, 
Associate Secretary 


MINUTES OF THE ANNUAL MEMBERSHIP MEETING, NEW YORK, 

DECEMBER 28, 1<)49 


The meeting was called to order at 4:30 P.M. by President Jerzy Ne)nnan. 
The annual reports of the President, Editor, and Secretary-Treasurer were read. 
They are printed elsewhere in this issue. 

It was moved by Harold Hotelling that the front cover of the Annals in the 
future shall bear the additional notation that it was edited during the years 
1938-1949 by S. S. Wilks. Motion was seconded and carried unanimously. 

The tellers reported the election of the following officers: 

President-Elect P. S. Dwyer 

Members of the Council for 1950-1952 David Blackwell 

W. G. Madow 
Frederick Mosteller 
L. J, Savage 

Meeting was adjourned at 5:15 P.M. 

Carl H. Fischer 
Secretary 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1949 . 

I wish to begin my Report by welcoming the newly elected Fellows, Doctors 
Z. W. Bimbaum, D. J. Finney, H. O. Hartley, Wassily Hoeffding, Michel 
Lofeve, Edward Paulson and S. N. Roy. In addition, a hearty welcome is due 
to Dr. G. W. Brown who was elected last year, but inadvertently omitted in’the 
published list. The election to the fellowship is a mark of recognition on the part 
of the Institute. At the same time, I am sure the Institute has reason to be proud 
of having among its fellows such distinguished scholars as are now added to the 
list. 



156 


REPORT OB' THE PRESIDENT 


During the past year the intensity of tlie Institute's life grew markedly in 
many respects. In particular, a very considerable number of our members took 
part in various Committees, For the sake of brevity, the composition of all the 
Committees is given in a tabular form at the end of the Report. At this time I 
wish to express the indebtedness of the Institute to the Chairmen and to the 
Members of all the Committees. 

Undoubtedly the most important function of the members of the Institute is 
research and the moat important function of the Institute itself is the publica¬ 
tion of the results of this research. In this respect the past year brought about a 
fundamental change: after a dozen years of hard and most fruitful work us 
Editor of the Annals, Professor S. S. Wilks resigned this year and the Council 
elected Professor T, W. Anderson as his successor. According to our present 
Constitution, the term of office of the Editor is three years. 

About a decade ago I suggested and the Membership Meeting of the Institute 
approved that the cover of the Annals bear tlie name of its founder. Professor 
Harry Carver. Founded by Carver, the Annals were developed by Wilk.s and, 
now stand as the most important statistical iournal in the world. Accordingly 
the Chair will welcome a motion to add Professor Wilks’ name as a permanent 
feature of the cover of the Annals of Mathematical 'Statistics. 

While being grateful to Wilks and regretting his withdrawal, we should ex¬ 
tend a moat hearty welcome to T. W. Anderson. Because of his scholarship, 
broad vision combined with broadmindedness and because of hia energy, he is 
an excellent promise for the future of the Annals, It is a pleasure to exprcaa tlic 
gratitude of the Institute to Columbia University and, in particular, to Dr. 
Abraham Wald for providing the nccessaiy facilities for the Editorial office of 
the Annals. 

Prio’’ to embarking on the election of the new Editor, the I.M.S. Council 
approved an important document prepared by a special Committee chaired by 
S. S. Wilks, formulating the editorial policy of the Institute. 

Of the many fundamental parts of this document I wish to mention the fol¬ 
lowing: 


(i) In establishing the editorial procedure, special care should be taken to 
avoid the danger of the Annals becoming a one-group journal rather than 
serving the Institute as a whole .., the refusal to publish a paper on 
grounds of general policy (rather than because of some verifiable defects 
such as mistakes, triviality, lack of new material, etc.) shall be based on 
a unanimous agreement of the Editor and of all the Associate Editors," 

• general idea behind these passages is, of course, that thus far, the Annals 

ty the Institute and should provide facilities for 
a 1 the different schools of thought. My understanding is that this includes the 
^ostatiatician Cochran and the econo-statistician Koopmans, the multivariate 
tolerant Wilks, the quality-control-minded Shewhart and 
ependently-limitmg Loeve, the necessary- and sufficient-normal Feller and 

the ’^^la-ti^tically-oybernetic Wiener and 

the general-sequential-decision-maker Wald. I should think that even our next 



hbport of the president 


157 


President, the stochastically-pz’ocessed-Markovian Doob, is meant to have a 
chance to publish in the Annals, from time to time. 

(ii) Another interesting point in the same document concerns the proposed 

approximate distribution of space in the Annals; 

(a) research papers on mathematical statistics proper—60 per cent; 

(b) research papers m borderline fields, including applications—20 per 
cent; 

(c) expository papers—15 per cent 

(d) news, notices, etc,—5 per cent. 

Since in the past there was too little expository material, the Council insti¬ 
tuted the so-called Special Inivited Papers, to be presented from time to time on 
selected subjects. The text of these papers, accompanied by the prepared dis¬ 
cussion, will be printed in the Annals. The program of the present meeting in¬ 
cludes our first Special Invited Paper, by Michel Lofeve. It is hoped that the 
Special Invited Papers will satisfy the need for expository material now felt by 
the membership of the Institute. I am sure the Program Committees will appre¬ 
ciate suggestions of the Members regarding the sections of the theory requiring 
expository presentation. 

The financial aspect of the publication program of the Institute was a con¬ 
tinued worry of the Council. As is well known, the Annals is overloaded with 
papers and the cost of printing is growing constantly. In order to ease the situ¬ 
ation somewhat, our new Constitution was amended to include the provision 
that the Universities and other institutions could become Institutional Members. 
There is already some additional income from this source and, if all the members 
of the Institute are energetic in urging their Departments to become Institu¬ 
tional Members, this income may be quite substantial. 

It is conceivable that some potential sources of funds exist, not directly avail¬ 
able for the Annals, which may be used for starting a new statistical journal. 
In order to investigate this possibility a special committee was appointed under 
the chairmanship of Professoi Scheff6. This Committee did an excellent job in 
trying to find a solution of the tremendously difficult problem and there is now 
a reasonable hope that, in the not very distant future, our publication facilities 
will be increased. 

Another deep change in the structure of the Institute occurred this year. 
Here I have in mind the resignation of Dr. Paul S. Dwyer, our long and hard 
working Secretary, and the taking over by Dr. Carl Fischer. Dr. Dwyer’s resig¬ 
nation was announced last year at the meeting at Cleveland and we expressed 
to him our hearty thanks for his untiring work for the Institute. I wish to repeat 
these thanks now and to accompany them by the hearty congratulations on the 
excellent program he prepared for this meeting in his new capacity as the Chair¬ 
man of the National Program Committee 

Until recently, there was a certain disequilibrium in the location of the meet¬ 
ings of the Institute. Practically all of the meetings were held in the East and 
the West Coast membeis could attend them only as a matter of exceptional luck. 
Later, regional meetings were organized, and this year we have functioning three 



158 


HEPORT OF THE PRESIDENT 


E«gional Program Committees, one for the East, one for the West Coast and 
one for the Middle West. In addition, -we have Program Committeos for the two 
National Meetings of the Institute. In parallel with the redistribution of meet¬ 
ings, there was an increase in their number. This process was accompanied by 
the very efficient help on the part of the governmental organisations, of the 
Office of Naval Research, the Air Force, and the Army, for the members of the 
Institute to attend the meetings even if they are held at a considerable distance, 
As a combined result of these developments it now may seem that there arc too 
many meetings. Undoubtedly, the number and the location of future meeting.s 
of the Institute will bo seriously discussed and adjusted to the existing needs. 

Naturally, the help of the Governmental institutions was not limited to help 
in travel. A considerable number of research projects in statistics are now in 
progress in many institutions with excellent results for science, for the younger 
people who are pven the chance to make their first independent research work 
without undue worry about food and shelter and, thus, for the country os a whole. 
The first organization to support fundamental research in general, and in sta¬ 
tistics, in particular, seems to be the office of Naval Research, Its broadminded¬ 
ness and understanding of the spirit of research have established a very high 
standard which is also sustained by other institutions. If permitted to function 
as they do now, these institutions will mark an epoch in the development of 
scholarly work in this country. 

The following persons have accepted the appointment to the Nominating 
Committee for the next year 

Henry Scheff6 —Chairman 
Albert W. Bowker 
Paul G. Hoel 
Leonid Hurwicz 
Herbert E. Robbins 
David F. Votaw, Jr. 


Composition of the Committees of the Institute in 1949 


1. Program Committees (P.C.) 

(i) Eastern P.C. for the April 1949 
meeting in New York 
Churchill Eisenhart, Chairman 
W. G. Cochran 
C, F. Kossack 
S. B. Littauer 
F. Mosteller 

(iii) National P.C. for the Summer 
Meeting at Boulder, Colorado 
W Feller, Chairman 


(ii) West Coast P. 0. for June mooting 
in Berkeley 

M. A Girshick, Chairman 
Z, W. Birnbaura 
W. J. Dixon 
J. L, Hodges, Jr. 

P. G. Hoel 
A. M. Mood 
(iv) Mid West P.C. 

C. C. Craig, Chairman 



REPOnT OP THE PRESIDENT 159 

J. L, Doob 

L. Hurwicz 

M. A, Girshick 

W. G. Madow 

C. C. Hurd 

K.May 

J. Wolfowitz 

L. J. Savage 


D. R. Whitney 

(v) National P.G. for the Decem¬ 

(vi) Eastern P.C. for the Spring 1950 

ber meeting in New York 

meeting in North Carolina. 

P. S. Dwyer, Chairman 

H. Hotelling, Chairman 

J. Berkson 

D. Blackwell 

G. W. Brown 

H. Geiringer 

C. Eisenhart 

S. B. Littauer 

Mark Kac 

D. F. Votaw, Jr. 

H. Rubin 

S.S. Wilks 

2. Committee for Special Invited Papers 


J. W. Tukey, Program Coordinator, Chairman ex officio 

C. C. Craig 

W. Feller 

P. S. Dwyer 

M. A. Girshick 

C. Eisenhart 

H. Hotelling 

3. Committee on Editorial Policy {1918—1949) 

S. S. Wilks, Chairman 


W. G. Cochran 


W. Feller 


M. A. Girshick 


P. S. Olmstead 


J. Neyman 


W. A. Wallis 


J. Wolfowitz 


4. Committee to Nominate Candidates for 

the Editor of the Annals 

Harry C. Carver, Chairman 

Howard Levene 

David Blackwell 

Frederick MosteUer 

S. Lee Crump 

Herbert E. Robbins 

Erich L. Lehmann 


5. Committee on Tabulation 


C. Eisenhart, Chairman 

C. C. Hurd 

C. I. Bliss 

A. N. Lowan 

F. W. Dresch 

W. G. Madow 

H. H. Germond 

H. G. Romig 

H. 0. Hartley 

L. E. Simon 

6, Committee on Directory 


John W, Tukey, Chairman 


Churchill Eisenhart 


7. Committee to Revive the Statistical Research Memoirs 

Henry Scheff6, Chairman 

C. C Hurd 

T. W. Anderson 

George Kuznets 

Walter Bartky 




160 


REPORT OP THE PRESIDENT 


8. Rietz Lectures Committee 

The Chairmanship of this Committee was acceptcfl by Abraham Wald, 
the first Rietz Lecturer, who undertook to make further appointments, riiese 


are: 

C. C. Craig 
W. Feller 

9. Committee to Encourage Membership outside of the United States 
T. W. Anderson, Chairman 
C. C. Hurd 
M. Lohve 
J Marschak 

10. Committee on Statisticians in the Government Service 
W. E. Deming, Chairman 

C. Eisenhart 

11. Representative of the I.M.S. to the American Association for the Advancement 
of Science 

Harold Hotelling 

12. Representative of the I.M.S. to the National Research Council, Division of 
Physical Sciences 

Walter Bartky (1948-1950) 

13. Representative of the I.M.S. to the Mathematical Policy Committee 

S. S. Wilks 

14. Representative of the LM.S. to the Joint Committee for Development of Statistical 
Applications in Engineering and Manufacturing 

Benjamin Epstein 

15. Representatives to the Inter-Society Cooperation on Mathematical Training of 
Social Scientists 

T. W Anderson 
J. L. Doob 
S.S. Wilks 

16. Committee to Determine the Duties and Responsibilities of the Program Com¬ 
mittees 

Harold Hotelling, Chairman 
M. A. Girsbek 
S. B. Littauer 


December 31,1949 


J. Nkyman 
Promlmt 


REPORT OF THE SECRETARY-TREASURER OF THE INSTITUTE 

FOR 1949 

At the beginnmg of 1949 the Institute had llOl members and during the 
period covered by this report 153 new members (8 of whom begin their member¬ 
ship with 1950) joined the Institute and two members were re-instated. During 
1949 the Institute lost 87 members of whlrh 



REPORT OF THE SECBETARY-TBEASRRER 161 

suspension for non-payment of dues. Judging from the information available at 
this date, the Institute will have 1167 members as it starts 1950. 

During 1949 the Constitution was amended to provide for a new class of mem¬ 
bership: Institutional Membership. Although the campaign for institutional 
members started late in the year, by December 31 there were five universities on 
the rolls: California, Purdue, Illinois, Princeton and North Carolina. It is hoped 
that many more universities and corporations will enroll during 1950. 

Meetings of the Institute held during 1949 included those at Columbia Uni¬ 
versity on April 8-9, at the Berkeley campus of the University of California on 
June 16-18, at the University of Colorado on August 29-September 1, and at 
New York City on December 27-30. The Secretary wishes to call attention to 
the excellent work of the members who served as Assistant and Associate Secre¬ 
taries at these meetings: Professor S. B. Littauer at New York, Professor J. L. 
Hodges, Jr., at California, Professor H. T. Guard at Colorado and Associate 
Secretary Professor Littauer who was responsible for the New York Meeting. 

The following Fellows served as members of the Committee on Fellows: C. 
C. Craig, chairman, T. W. Anderson, M. A. Girshick, Harold Hotelling, Henry 
Scheff6, and F. F. Stephan. 

The meeting scheduled for November 25-26 at the University of California 
at Los Angeles was cancelled by vote of the West Coast membership because of 
the proximity of the Boulder and Christmas Meetings. 

At the Council meeting at Boulder, August 29, 1949, the following Associate 
Secretaries were elected; 

Associate Secretary Section 

S. B. Littauer Eastern 

K J. Arnold Central 

J. L. Hodges, Jr, Western 

By a mail vote of the Council, conducted during October, 1949, T. W. Ander¬ 
son was elected Editor for the period 1950-1952. 

A summary of the financial status of the Institute is given below: 


FINANCIAL STATEMENT 
December 20, 1948 to December 31, 1949 

A. RECEIPTS 

Balance on Hand,* December 20,1948 , ... . ... $7,121.01 

Dues. . . ... 7,826.36 

Contributions. . 166 16 

Life Memberships .... .. . . . . 392.50 

Institutional Memberships . . .. . 400.00 

Subscriptions ... . 4,779.07 

Sale of Back Issues . .... . 3,314.41 

Biometrika. ... ... . . 793.50 

Income from Investments .. , . . 100.00 

Miscellaneous. . . 169.70 

« ' 

Total . . .. .... $25,062 69 


* In bank deposits and government bonds. 





162 


UEPOnT OF THE BECUETAIlY-TKEASiimBH 


B. EXPENDITDBE8 

Annals—Current 

Office of the Editor . . . $ 275.00 

WaverlyPiess . 8,777 08 | 0,052.65 


Annals—Back N umbers 

Reprinted Vol. IllJifi, IVfjfS & !i(4; V«l, VI*!, 2 , 3 &4; 


Xm #1,2, &4. 

Mathematical Reviews and Inter-Society Cominittee ... 

Office of the Secretary-Treasurer 

Printing, memoranda, etc. (Including some stamped envelopes).. §1,150.(51 

Postage, supplies, express, telephone calls. .... 276 00 

Clerical help . . 2,208.40 

Travelling expense . . 223,61 


$ 2,910-.55 

m.m 


I 3,863.62 


Miscellaneous. .. , . I 37 y 

Biometrika. , .. . . .j 057.30 

Balance on Hand, ♦December 31,1949. . . § 7,982.08 


Total. 


§25,052 69 


0. BUMUART OP REOIIPTB AND BXPENDITUHBB 


Balance on Hand, *Decembor 20,1948 . 

Receipts during 1949 . . 

Expenditures during 1949 ... . 

Balance on Hand, ♦December 31,1949 . 

D. LIPB MBMBB1I18H1P PONCB 


§ 7,121.01 
17,931.68 
17,070.61 
§ 7,982.08 


It has been the practice to set up an amount equal to all life membership payments a« a 
liability and to hold all these funds in reserve until the death of the member -after which 
his payment is released to the general fund, There wore three new life meraborahip paymen Is 


Number of Life Members . 
Total Reserve Held. . 


Doambvr so, 

loa 


. .. 29 

.... §2,280.00 


De^fmbtt SI, 
ism 


32 

§2,672.50 


H. BACK IBSUEB FUND 

It has been our policy, since January 1,1948, to use income from the 
finance the additional reprinting of back issues. 

Previous balance in back issues fund. 

Income from the sale of back issues during 1949. 

Expense for reprinting back issues in 1949.^. 

Balance, December 31,1949 .’ . 


sale of back iseucs to 

. § 749.77 

. 3,314 41 

. 2,010.65 

. §1,153.63 


ASSETS 


P. BACANCB SHBHT, DBICEMBBR 31, 1949 


Gash. 

R S Government G Bonds .. . . . 

R. S. Government P Bonds (Purchase price). ’ 
Current Accounts Receivable 

Estimated Value (Coat of Back Annals**) 


In bank deposits and government bonds 
Cost of Annois calculated at 67 cents per copy 


D«cemheT5l,t&4$ 

/n«riw»o sinep 
Deeembtr SO, IS4S 

• .. § 3,094.08 

§ 861.07 

■ .. 3,000.00 


1,888.00 

, — 

• .. 645.78 

254.60 

. 16,459.22 

3,673 61 

§24,987.08 

§4,789.24 














REPORT OP EDITOR 163 

LIABILITIES 

Reserve for Life Memberships .... ... $ 2,072,50 $ 392 50 

Reserve for Reprinting Back Issues. 1,153.63 403.36 

Suiplus . . 21,160.95 3,992,88 


$24,987.08 $4,789 24 


Q. SUMMABY 

The surplus of the Institute has increased during the year of 1949 by $3,992,88, While this 
indicates a favorable condition, it should be noted that roughly 92% of this gam is repre¬ 
sented by an increase in the inventory of back issues of the Annals, This asset is definitely 
of the non-liquid sort and thus the major portion of our gain is of little assistance in meet¬ 
ing our current need for more publication space in the Annah. 

It should be noted that the year-end statements have always included a substantial 
amount in prepaid dues and subscriptions on the asset side without a corresponding lia¬ 
bility. The figure for December 20, 1948 is $4,060.50 and for December 31, 1949 is $4,682 37. 
Thus it will be seen that we are virtually running on a hand-to-mouth basis. It is hoped that 
an increase in the number of individual and institutional memberships during 1950 will 
bring us into a more favorable situation 

Beginning with January 1, 1950 we plan to revise the bookkeeping system which is no 
longer adequate for an organization of our present size. In the future, these reports will be 
made on an accural basis rather than a cash basis and thus will present the data pertaining 
to each year on a more realistic basis 

We are now in a position to supply all issues beginning with Volume 1. Five or six of the 
back issues are in short supply, but we expect to be able to reprint these when our supplies 
become exhausted, using receipts from the sale of back issues to pay for the reprinting, 

Carl H. Fischer 

December 31, 1949 Secretary-Treasurer 


REPORT OF THE EDITOR OF THE AHNALS FOR 1949 

The 1949 volume of the Annals exceeded, by a few pages, the 600 pages bud¬ 
geted for it at the begiunmg of the year, A total of 66 papers were published, aa 
well as the usual reports, abstracts, and items of news and notices. The 1949 
volume was Volume 20 of the Annals, and it seemed fitting to publish a cumula¬ 
tive index of papers for the first twenty volumes of the Annals. Such an index, 
containing both author and subject indexes, has been published as a separate 
31-page pamphlet and is being distributed with the December 1949 issue of 
the Annals. 

The rate of submission of manuscripts contmues to increase. By the end of 
1949 enough manuscripts to fill two issues of the Annals had been accepted for 
publication. At the same time approximately forty manuscripts were at various 
stages of refereeing and revision. This means that authors submitting manu¬ 
scripts at the beginning of 1950 can hardly expect to see their papers in print in 
less than a year. The rate at which the average gap between submission of manu¬ 
scripts and their appearance in print has, for the last two years, increased about 
two issues (six months) per year. There is no reason to predict that this rate will 
change for at least another year or two. Thus, it is highly desirable that every 
effort be made to expand the publication program of the Institute during 1950. 



164 


HEPORT OF EDITOR 


The most immediate possibility would be to expand the Anmls by at least 100 
pages if the budget will permit. In the meantime, it is hoped that the Institute 
committee to study the feasibility of reviving the Slalislical Rmardi Memoirs 
will be able to work out a piactical plan for further increasing the publication 
facihties of the Institute. 

The manuscripts being submitted continue to cover a wide range of topic.s in 
probability and statistics. There is still a scarcity of good review and expository 
articles being submitted, but with the institution of special invited addre89e,s so 
widely discussed at the Cleveland meeting of the Institute in December, T.>48, 
0 we can expect to receive more review and expository articles in the future. 

The Editor takes this opportunity to acknowledge, on behalf of the Editorial 
Committee, the refereeing assistance which has been generously given during the 
year by the following persons: A. C. Aitken, E. W. Barankin, Z. W. Birnbaum, 
R. C Bose, A. H. Bowker, G. W. Browm, K. L, Chung, W. J. Dixon, A. 
Dvoretzsky, Hilda Geiringer, L. A. Goodman. T. N. E. Greville, E. E. Grubbs, 
JohnGurland,M.H.Hansen, T.E.Harris,H.O. Hartley, E. h. Kaplan, B. F. 
Kimball, T. Koopmans, Julius Lieblein, H. Levene, M. S. MacPhail, P. J. 
McCarthy, R. B. Murphy, G. E. Noether, E. G. Olds, P. S. Olmstead, Richard 
Otter, E. Paulson, M. P. Peisakoff, E. J. G Pitman, Milton Sobel, D. F. Votaw, 
Max Woodbury, and J. L. Walsh. 

Thanks are due to Mr. M. E. Freeman, Mr, L. A. Goodman and Mr. E. F. 
Whittlesey for preparation of manuscripts and to Mrs. Lily D. Smith for other 
editorial and office assistance in connection with the Annals, 

Fmally, on behalf of the Editorial Board, which has had the responsibility for 
editing the Annals since 1938, the Editor extends every good wish to the new 
Editor, T, W. Anderson, and the new Editorial Board, who will inherit nearly 
a full year of accepted manuscripts but will otherwise assume editorial responsi¬ 
bility for the Annals beginning with the 1950 volume. 


December 21,1949 


S. S. Wiwa 
Editor, 



THE IDENTIFICATION OF STRUCTURAL CHARACTERISTICS^ 

By T. C. Koopmans and 0. Reiebs0d 
Cowles Commission for Research in Economics 

1. Introduction. 

1.1. "Population" versus "structure." In a fundamental paper (Fisher, [1]) 
R. a. Fisher distinguished as the first group of problems in mathematical statis¬ 
tics the "specification of the mathematical form of the population from which 
the data are regarded as a sample." It is the purpose of this article to suggest a 
reformulation of the specification problem, appropriate to many applications 
of statistical methods, and to point out the consequent emergence of a new 
group of problems, to be called identification problems. 

In many fields the objective of the investigator’s inquisitiveness is not just 
a "population” in the sense of a distribution of observable variables, but a 
physical structure projected behind this distribution, by ivhich the latter is 
thought to be generated. The word "physical” is used merely to convey that 
the structure concept is based on the investigator’s ideas as to the “explanation" 
or "formation” of the phenomena studied, briefly, on his theory of these phe¬ 
nomena, whether they are elasaified as physical *in the literal sense, biological, 
psychological, sociological, economic or otherwise. Examples of such structures, 
drawn from the fields of economic fluctuations and of psychological factor 
analysis, are given in sections 3 and 4. More detailed discussions of these exam¬ 
ples can be found in other publications by the present authors and by others 
[15], [19]. In this article, we are therefore not concerned with the merits of par¬ 
ticular assumptions entering into the specifications considered. Our examples 
are used only as the basis for a generalizing formulation (Section 2) and a com¬ 
parative discussion (Section 5) of the identification problem, i.e,, the problem 
of drawing inferences from the probability distribution of the observed variables 
to the underlying structure. The belief is here expressed that this is a general 
and fundamental problem arising, in many fields of inquiry, as a concomitant 
of the scientific procedure that postulates the existence of a structure. 

The general formulation of the identification problem in Section 2 is, there¬ 
fore, held abstract. Some readers may prefer to give substance to the various 
concepts by reading Sections 3-4 alongside Section 2. In addition, we insert 
here a simple example showing the main features of the identification problem, 


' To be included m Cowles Commiasion Papers, New Serica, No ,111 The authors reported 
on this study in papers before the Berkeley meeting of the Institute of Mathomatiool 
Statistics in June 1948, We are indebted to Dr. G. Rasoh of the University of Copenhagen 
and to Professor L L. Thuratone of the University of Chicago foi many fruitful diBouasions 
on the subject matter of this article, for which the responsibility lies exclusively with ihe 
authors. 


1C5 



1G6 


T. C. KOOl'it.'l.NH AND O IIKIKHMIL 


1.2, A simple example of llie idmlijicndon prvhlnn. i*? I'ouct'nicd 

with the problem of eatiiniiting the paiunicK'i’.t «, /I, <il’ ii liiicai n'latinn>Iiip 

(1.1) 5)2 = a + (irji 

between two variable.^ tji and r)> both of wlnoh aic (il)''i'ived only .subject (r> 
errors of observation Ui and d’hu.s, (ib.acrviitinns arc avadaiilc only lur tlic 
variables 


(1.2) 3/i = 5)i + n. where l), / 1,‘J. 

The question under what condilion.s a el)Il.^i^'(eII( e'^iimnlc of ,i e\is|s Ims 
repeatedly attracted attention To dwii.'-.-. (lu.s (Hie.slioii, we ,‘jhall laiiwuler a 
model in which ?)i is independent of («, , n-,) and in whieli llie joint di'^lrilmtion 
of Ui and tm is normal. 

If also the distribution of iii is normal, it i.s easy to hP(> fhal d eaniiol l»e (hder- 
mined from a loioivledgc of the joint probability (li.slnlmtinn of the ob'‘cr\-(!d 
variables Pi and y^." In this ea.se the joint di.slrilaition of //, and i,s ah .0 normal 
and the distribution is completely ehauicterizeil by live innunieteis, /','(?/i), 
Eiys), var (j/i), var (j/a), and cov (y,, 7 / 2 ). The purainclers d ttml vtir ( 7 ),) may 
now bo chosen in any way such that the .second ferni in the light hand mein- 
ber of 


'var (yi) cov (yi, y{)l _ [l /3 I f . var (m) cov fu, , » 

_cov(yi,y2) var (7/2) J J [‘‘"v(ui , «.) vav{ih) 

IS a positive definite matiix. It is clear that if the h'ft hand incmlier is non- 
singular, this condition can be met for any arbitrary value of d combined witli 
a sufficiently small value of var (in). 

It can be shown that d is uniquely determined by the joint probubiiity di.s- 
tribution of yi and ys if this distribution is not normal. We .shall pr()\'o tins in 
the case that certain semi-invariants exist.® 

Let <#>»i!/ 2 (h, tf) denote the characteristic function of the joint distrilmtioii 
of yi and yj 

(L3) <t>y,y,{ti,tf) = 

and let 


1 ^ 1 / 11 / 2 (h > tf) = log ‘fyiy.ik , l«). 

Similar notations will bo used for the charac.leri.slic, fimction.s of ollun- random 
variablGs, and the logarithnis of these functions. 

Since {u ^, uf) and (tji , tjf) are independent, we obtain 

_^^1/11/2(11 , b) = 4'rllViUl ) h) + 4'-UlUi{k I h), 

* See [13], middle of page 70 

’ The following proof is analogous to that given by Geary [8] in the case when the u’s 
are not supposed to be normally distributed, but independent 



IDENTIFICATION OP STRUCTURE 


167 


and from equations (1.1) and (1.3) we obtain 

= + ^tn), 


or 

( 1 - 6 ) i'mm ~ otHi + 'Pniti + 

Combining (1.5) and (1 6), wc have 

(1-7) V'vi»j(b ) ^ 2 ) = oitU + ^,i(b + fik) + 'PuiuiiU , < 2 )) 

where , /a) is a polynomial of second degree, since the joint distribution 

Ui and tta is normal. Let be the semi-invariants of the distribution of (j/i, 2 / 2 ) 
and let k, be the semi-invariants of the distribution of tji . Comparing coefficients 
in equation (1.7), we obtain 

(1.8) Kr, = j8”Kr+, (l* + S > 3) 

and from this equation again 

(1.9) Krt = ^Kr+l.,-1 (r + S > 3, S > 1). 

If at least one with r + s > 3, is finite and different from zero (which 
implies that the joint distribution of i/i and j /2 is not normal), may be deter¬ 
mined from one such equation given the joint distnbution function of i/i and j/j, 
1 3. Remarks on the history of the ^denUjicaiion problem. The identification 
problem has been discussed, in various terminologies and formulations, by 
quantitative thinkei-s in several fields It is interesting to note that most of the 
contributions have come from researchers whose main attention was directed 
to particular fields of application For this reason, perhaps, its general formula¬ 
tion was not attempted until recently. 

In economics, contributions of increasing explicitness and generality were 
made by Pigou [18], Henry Schultz [20], Frisch [3], [4], [5], [6], [7], Marschalc [17]. 
The main contributions to the formalization and explicit mathematical analysis 
of the problem were made so far by Haavelmo [9], Koopmans and Rubin [15], 
Wald [24], and Hurwicz [10]. 

In his books on factor analysis [21], [22], Thurstonc discusses in several places 
questions of identifiability.’ Pieviously the lack of ideiilifiability in a certain 
factor analysis model had been demonstrated by numerical examples by G. H. 
Thomson [27] Models used in the analysis of latent structure in attitude and 
opinion research by Lazarsfeld [10] give nsc to similar identification problems. 
In biometrics, the “method of path coefficients" of Sewall Wright [25], is essen¬ 
tially a method where a structure is postulated behind the observable distri¬ 
bution, and the identifiability of that structure discussed. The identification 
problem is also met with in the theory of the design of experiments, particularly 
m the method of confounding (Fisher [2], Chapter 7, Yates [26]). When con- 



168 


T. C. KOOPMANS AND O. HEIERS^L 


founding is used, the identifiability of certain parameters (second order inter¬ 
actions, say) is sacrificed in order to gain certain advantages in tlie testing of 
hypotheses concerning (and in the estimation of) the parameters tliat remain 
identifiable (main effects and first older intcraclimus, say). 

2, General formulation of the identification problem. 

2.1. Latent variables, observed variables, and sirupLurr. In eacli of tlu* C'iam]iles 
considered in this article, the distributional .specific.atiou applii'.s diivctlj' to 
certain non-observable or in any case non-obscived I'aiinblc.s, variously r(‘f(*rrcd 
to as errors of observation (like Ui and u, above), disturbance.s, “tine” variahlca 
(like above), specific factors, etc We shall refer to these as latimt variuldes, 
denoted by a vector u. Ib. addition, certain structural relationships —like (1.1) 
and (1.2)—are specified •which connect the latent valuables with the observed 
variables, denoted by a vector y The specification is therefore concerned with 
tne mathematical forms of both the distribution of the latent variables and the 
relationships connecting observed and latent variables 

The term “mathematical form” carries a suggestion of parametric specification 
which obviously is not the only possible type. We shall therefore employ terms 
and concepts introduced by Hurwicz [10] which cover both parumotne and non- 
parametno specifications. By a structure S = (f, <#>) we imderslaiid a particular 
probability distribution function 

(2.1) f?(u) 

of the latent variables—thought of, if you wish, as given numerically to a 
desired degree of accuracy, either by a cumulative distribution surface or curve 
or table, or parametrically by numerical values of the parameters—-combined 
with a particular structural relationship (or set of simultaneously valid rela¬ 
tionships) 

(2-2) u) = 0 

between observed and latent variables—again given numerically by curves, 
surfaces or parameters—which permits unique determination of the observed 
variables y from the values of the latent variables u (except possibly for a set 
of M-values occurring with probabihty zero) The corresponding piobability 
distribution 

(2-3) H{y I S) 

of the apparent variables is therefore uniquely determined by the structure *S'. 
and is said to be generated by S. 

2.2. Specification of a model. We shall use the term model to signify a set of 
structures We can thus say that the specification problem is concerned with 
specifying a model (g which by hypothesis contains the structure S generating 
the distribution H of the observed variables. 


^ A set will be denoted by a German character correaponding to the Latin character 
denoting its representative element. r onaracter 



IDENTIFICATION OF STRUCTUUB 


169 


As a result of this reformulation of the specification problem, a new problem 
of inference arises, which logically precedes all problems of estimation or of 
testing hypotheses. It has already been deduced from the definition of structure 
that a given structure S generates one and only one probability distribution 
7?(j/ 1 iS) of the apparent variables. However, statistical inference from any 
number of observations can relate only to characteristics of the distribution of 
the observed variables. The limit of statistical inference is an exact knowledge 
of this distribution function, a limit not attainable but approachable if very 
large samples can be taken. Anything not implied in this distribution is not a 
possible object of statistical Inference. 

2.3. Identifiability of structural clmracterishcs by a model. It is therefore a 
question of great practical importance whether a statement convei-se to the one 
just made is valid; can the distribution H of apparent variables, generated by a 
given structure S contained in a model @, be generated by only one structure in 
that model? This is by no means impHed in the definitions given, and it is not 
generally true. Whether or not it is true m a particular instance depends—as 
illustrated in our examples—always on the model ©, and often on the given 
structure S besides If it is true, we shall say that the model © identifies the given 
structure S, or that the structure S is identifiable by the model.° 

If a structure S is not identifiable by a model ©, some of its characteristics 
may still be uniquely determinable By a structural 'parameter 6{S) we under¬ 
stand a functional of the structure iS (This definition applies, of course, equally 
to the case of non-parametnc specification of the functions F, ^ defining the 
structure.) We further define that two structures S and S* are (observationally) 
equivalent if they generate the same distribution of observed variables, 

(2.4) Hiy 1 S) = Hiy \ S*} for all y 

We then say that a model © identifies a parameter 0{S) in a structure iSq , 
if that parameter has the same value in all structures So , contained in © and 
equivalent to So ■ This definition can obviously be extended to characteristics 
x(iS) of a structure S, other than parameters, such as the functional form of a 
relationship represented by a component of the vector <p, etc, 

2 4. The identification problem. It has now become clear that our reformulation 
of the specification problem has given rise to a new group of identification prob¬ 
lems: to determine which of the parameters or other characteristics of a given 
structure are identifiable by (or “within”) a given model. 

It is perhaps premature to attempt assigning to identification problems a 
definite place in a classification of statistical problems such as was undertaken 
by Fisher. One might regard problems of identifiability as a necessaiy part of 
the specification problem We would consider such a classification acceptable, 
provided the temptation to specify models in such a way as to produce identifi¬ 
ability of relevant characteristics is resisted. Scientific honesty demands that 

® The concept here designated briefly as "identifiability” has been called “unique 
identifiability" in another context (Koopmans and Rubin [15], also Hurwioz [10]) in con¬ 
trast with “multiple” or “incomplete” identifiability. 



170 


T. C. KOOI’MVX.S AM) (). ItKIKHSOIj 


the specification of a model ho bawd on prior knowleflp;c' of I lie phonoiiienon 
studied and possibly on critona of Kimphcily, huii not on the desiie for idc'iitifi- 
ability of characteristics in which the resetuehor happcii.s to b(> inteu'Htc'd. 

Identification problems are not problems of fttatiahral infeunice m a strict 
sense, since the study of idcntifuiliility proceedK fiom a hypotliolical e\-iict 
knowledge of the probability distribution of observed variables rather than 
from a finite sample of observations. Ihmever, if. is clear tliat the study of 
identifiability is undertaken in order to evplovc' tlie limitations of .statistical 
inference. 

2 5. IdenLiJiabihly is subject lo slalislical test. Fintlier infei[)en('lralinn of the, 
pre-statistical analy.sis of identifiability with prulilonis of statistical iiifeK'uco 
proper arises from the fact, amply illustiated by our examples, that tin* identi- 
fiabihty of a structural charactciisticxO'i') often depends not only on IIk’ model, 
but also on the given structure S Thus, eacli structural eharaci eiistic, x dndfle.s 
the model @ exhaustively into two mulually exclusive sub.sets of stuieiuves 

(2 5) S = 0x + '2x 

(of which one may lie empty), .such tliat x(‘h') is unicpiely idmitifiable m .h',, liy 
the model if So belongs to not unKpiely idmililiaide if ^S',, Ixdong.s (o . 

We shall call x(S) uniformly idenhfiabla by © if ©^ eoiueuh's with 3. 

The subdivision of ©.into ©^ and ©^ ha.s an miporlant inoperly: [f iS'o belongs 
to ©X , then all structures St equivalent to >S'o also belong to ©x , and a similar 
statement holds for ©x . This propex’ty follows dimdly from the definition of 
identifiability of x{S) given above. Its meaning is that the ideiitirKilnlity of 
x(S) in So depends only on the distribution of Iliy) = //(;/ | So) of idi.served 
variables generated by So. To the subdivision of tlic model corn‘.spends an 
exhaustive subdivision 

( 2 . 6 ) § = 

of the set 

(2 7) § = §(©) 

of all distribution functions H{y \ S) generated by the structures S of ©, into 
the subset fpx containing those distribution functions n{y\>H) generated liy 
structures S m which x(S) is uniquely identifiable, and the subset .'p^ eoutuining 
functions H{y ] S) generated by structures for which the oppo.site i.s (rue. 

Hence, whenever the identifiability of xiS) cannot bo decided in the .same 
sense (affiimatively or negatively) for all structures S of © a.s a reHiilf. of I'lflier 
©X or @x being empty, then the identifiability of the characteristic x(.S) (if 
the structure S generating the observations is a property of the di.stribulioii 
H{y I S) of the observations. This identifiability is equivalent to the hypothesis 

(2-8) H{y\S) belongs to > 



IDENTIFICATION OF STRUCTURE 


171 


which is in principle'' subject to statistical test under the maintained hypothesis 

(2 9) S) belongs to §. 

2 6. Testing particular specifications. Often the model is defined by one general 
specification supplemented with a number of particular specifications wliicli are 
"detachable pieces” m the sense that they can be removed, added or replaced 
by alternatives to construct alternative models. We may define the general 
specification as a set (S of structures which is postulated to contain the model S' 
in question as a subset Particular specifications can then be defined as subsets 
© 1 , , • • • of © of which the model ©' is the intersection 

(2.10) ©' s © n ©z n ©2 n • • ■. 

An example is that of parametric specification of the “form” of the functions 
u) defining the structural relationships and of the distribution function 
F{u) of latent variables as the general specification, and specifications of the 
values of certain parameters of 0 and F as particular specifications. 

In such situations, it is an important question whether a given particular 
specification is—again in principle—^subject to statistical test. Wlienever the 
answer depends on the other particular specifications, we may ask further which 
minimum set of other paiticular specifications must (together with the general 
specification) be entered into the “maintained hypothesis” in order that that 
given particular specification be subject to statistical test A formal answer to 
this question, facilitating specific answers m each concrete case, can be given 
as follows. 

Let a model © be narrowed down to an alternative model 

(2.11) ©' = © n ©z 

by a particular specification ©i. This particular specification will be called 
oiservaiionally restrictive if the set >^(©') of all distiibution functions H{y | S') 
of obseiwed variables generated by the structures S' of ©' is a proper subset 
of the set >‘p(©0 of all distribution functions Hiy j S) generated by the structures 
S of ©. A statistical test of the particular specification ©z can then be constructed 
by choosing as the hypothesis subject to test 

(2.12) H(y) belongs to §(©'); 
and as the maintained hypothesis 

(2.13) n{y) belongs to §(©). 

The particular specification ©i remains subject to test if the model © is stripped 
of such other particular specifications which are not necessary for the observa- 
tionally restrictive character of ©z, although of course the outcome of the test 
may become either less or more certain as a result. 


• See sub-section 2 7 below. 



172 


T. C. KOOPMANR AND (I. IU-nKH.S0D 


A Irequcnt case of an observiLlionally rostrictivf, spocififation is that whcie a 
parameter 0 (S) already identifiable in almost all stnictures iS of 2 , is u'strieted 
by 01 to a prescnbed value (or to a prescribed point set not conttumiip; all 
points of its domain for all b' of 0) In this case, the speeifieaf ion in (pieslion 
has been called overidentifying. 

2 7. Remarks on the testing of hypolkesrs. In subsections 2,5 and 2.11 we have 
without further inquiry applied the expre.s.‘>ion "liypothesis in priiieipli' .subject 
to test” to any hypothesis which narrows down the, set .'p of di.stnbution func¬ 
tions H generated by structures of the moded to a propiu- .siih.set .'p'. If ^vill be 
clear that, to make a test actually possible, Ip' cannnt be allowed to lie every¬ 
where dense in For instance, if ^ is defined parametrically, a hyiiofhcsi.H 
restricting §' to rational values of the parameters as clearly not subject to sfati.s- 
tieal test. Just what set-theoretical requirements on tp' are needed to make a 
test possible is a separate problem which we shall not attempt to discuss. 

We have also in another sense oversimplified the problem of testing particular 
specifications. In practice this problem presents itself as the. choice of one mit 
of many possible combinations of several particular spccificationa, rather than 
a number of separate and unconnected choices between the rejection and the 
adoption of each particular specification under consideration. Pre.senl thc-ory 
of choice between two alternatives does not meet this .situation. 


3. An econometric example.^ 

In econometric studies® economic fluctuations have been described liy a system 
of difference equations in (observed) economic variable.s y, subject to two kinds 
of outside influences, emanating respectively from (observed) exogenous -i.e., 
non-economic—variables z, and from (latent) random disturbances u. ICaeh of 
these equations is given a definite meaning m terms of economic behavior. There 
may for instance be equations explaining respectively consumption expenditure 
(from incomes of vanous groups, price changes, etc.), the supply of consumers’ 
goods (from price margins between such goods and their raw materials and labor, 
productive capacity, etc.), investment expenditure, the supply of capital good-s, 
etc The purpose of the identification discussion is to investigate whether, cm 
the basis of given a priori knowledge as to the form of these equations, and in 
particular as to what variables occur in any designated equation, procedures of 
estimation or testing of hypotheses can be directed to the parameters of the 
equations of economic behavior themselves, rather than to the parameters of 

secondary ’ equations dependent on (derivable from) two or more of tlu' be¬ 
havior equations. 

In the case of linear systems of equations, a possible form for the general 
specification (the model ©) is as follows. 

(3.1) Bo/(t) -f Bi y'(t - 1 ) + ... 4 . - Tn,„x) + Vz'it) = u'it) 

’ For an expository discussion of identification profilems in econometric models see [14] 
See, for instance, J. Tinbergen [33] and'L. R. Klein [12J. ^ 



IDENTIFICATION OP STRECTmiB 


173 


represents the structural relationships. Here y'{i), u'(t) are column 

vectors (the transposes of row vectors) of G, K and G elements, respec¬ 
tively, for each discrete time point or period t = 1, 2, ■ • • , T, also t = 0, 
— 1, ■ • ■ , 1 — Tmni, for y'{i) Bo, Bi, • • are square matrices of 

order G, and P is a matrix of O rows and K columns 

(3.2) Bo IS non-singular. 

(3.3) The observed values z{t), t = 1, ■ • • , T, are held constant in repeated 
samples, and the components of z(t) are linearly independent. 

(3.4) The components of u(t) have a joint distribution function F(u) (with 
zero means and finite variances) which is independent of t and of z{t). 

(3.5) u{t) and w(i') are independently distributed if t i'. 

Particular specifications @i, ©2, • • • , that have been most frequently em¬ 
ployed indicate prescribed values (usually zero) of specified elements of the 
matrix 

(3.6) A s [Bo Bi . •. P] 

or of given linear functions of the elements of the p*** row aig) of A, for each 
value g - 1, • • • , G of p. It can always be arranged that of the linear restrictions 
on any one row of A, at most one is non-homogeneous (normalization rule), the 
others homogeneous. The homogeneous restrictions state which variables enter 
into each equation, and possibly with which ratios between some of their co¬ 
efficients. 

It has been shown [15] that in the model a necessary and sufficient condi¬ 
tion for the equivalence of two structures S = {^’(li), A] and S* = {F*(m*), A*} 
is that they are connected by a linear transformation 

(3.7) A* = TA, u'* = Tw', 
with non-singular matrix T. By definition, the model 

( 3 . 8 ) ©' = © n ©i n ©2 n • • • 

identifies a parameter dgk if, whenever A and A* belong to equivalent structures 
)S and B*, respectively, of ©', we have 

(3.9) a*k = ccgk . 

In order to attain such identifiability by linear restrictions on the row of A 
it is necessaiy that one non-homogeneous restriction (normalization rule) on 
the row of A be specified in ©'. Recalling that G represents the number of 
rows (and the rank) of A, it can be proved that it is further necessary for the 
simultaneous identifiability of all elements , /c = 1, • ■ , Z, in the row 
a.{g) of A, that at least G — 1 additional non-homogeneous restrictions be im¬ 
posed on that row, say 

(3 10) a{gW{g) = 0, p{$'(!7)} S G - 1, 



174 


T. C. KOOPMAN'S AN'l) 0. KKIKItfiS'lI. 


where ce{g) s [aoi • • • ajic], the '!>(?) are Riven niafricRS (eften witli elements 
0 or 1 only), and p(X) denotes tlic rank of X. The.se resirietions (3 KA are also 
sufficient (in addition to the normalization rule I if 

(3.11) p(M>'(!;) 1 - fr' ~ 1- 

The 3 ^'* row of the “rank criterion matrix” A'I''(f/) in (3.11) cimsiRts of zeros only, 
because of (3.10). Therefore, (3.11) reiiuire-s the other rows of that rnufrix to 
be linearly independent “ 

Thus, even if the model S' inelude.s, heaide.s a iiornuilization rule, the neces¬ 
sary condition (3.10) for the identifiability of the ij"' behavior equation, such 
identifiability is still absent in certain Rtruet,ur<‘.s, eorn’.spondinp; to a iioint act 
(generally of measure zero) in the space of the coefficients of the remiuniiig equa¬ 
tions, viz., the point set in which (3.11) is not satisfied. Whether or not A actually 
falls within this point set is, as wa.s stated before in more general terin.s, a prop¬ 
erty of the joint distiibution function II(y | z) of tlie nb''ervations y, and i.s 
therefore subject to statistical tast. In the present ease, tin’s i.s also .seen fiorii 
the fact that the rank of Al'j i.s preserved by the transformation (3.7), and i.s 
therefore itself an identifiable parameter. 

For certain scientific purposes explicit knowledge of A is imnceeKsarj'. (hu* 
such purpose is "prediction without change in .stnielure.,” i.e,, iiredietion of a 
value of y{t) for a future time t from a hyjiolhotieal value of z{l) on the assumii- 
tion that A and F{u) have not changed between the ob.servation period and tin* 
time point to which the prediction applies. Such pretliefion e.'m Imi based on 
the Imowledge of (a) the population regres.sions 

(3 12) y'H) = - 1) -|- • • • 4- Tlr^^y'it - r„,„) -f- II,z'(t) -b e'(i) 

of the “jointly dependent” variables y(Ji) on the “in-edelcrmiiiad” variables 
yit’ - l)i ■ • 1 ?/(t - Tmax), 2 (t) and of (b) the distribution fiinelion K(ii) of 
the population residuals 

(3 13) r(t) = y(t) - E{y(i) j y(t - i), ... , ,«)} 

from these regressions. Of course, the matrices "11” are functions of the struc¬ 
tural parameters (3 6) through 

(314) [-7 n] = [-7 Hi. • • n.] = - iCa 

and K(v) can be derived from F(u) through the transformation 
(3.15) = bA'. 

The important fact is that 11 and K(v), by thou definitions, depend only on the 
distribution function H(y | z) of the observations, and arc therefore uniformly 
identifiable This is also reflected in the fact that the right hand members of 
(3.14) and (3 15) are invariant for the transformation (3.7). 


• In that case, overidentification of a(g) will result if the inequality sign in (3.10) holds. 




IDENTIFICATION OF STEUCTURE 


175 


However, the most relevant economic problems are those in which a change 
in A or F{u) is actually or hypothetically present, and in which therefore the 
identifiability of the relevant parts or functions of A and of the characteristics 
of F{u) requires separate inquiry.“ 

4. An example from factor analysisdi Factor analysis has been presented in 
different forms by different authors We shall here consider the multiple factor 
analysis of Thurstone only [21], [22]. 

The factor analysis methods were developed primanly for the purpose of 
analyzing intelligence tests, but they have also been used for other psychological 
problems and in other sciences. 

Suppose that a person is given a battery of 0 tests. Let his score in test i 
be I/,. The fundamental assumption in factor analysis is that these scores can 
be explained in terms of a relatively small number of hypothetical primary 
factors. Let 2i, 22 , • • , denote the hypothetical scores of the person in the 
common factors, i e., those primaiy factors which are common to at least two 
tests in the battery. We assume that y, is a homogeneous linear function of 
the scores zt. plus a unique part ii,, which may be thought of as consisting of 
an error term plus the contribution of a specific factor. The coefficients Tr,t in 
the linear function jUst mentioned are called factor loadings. The factor loading 
7 r,i expresses the relative importance of the common factor k in the answering 
of test 1 . 

We shall introduce the row vectors y = [y,], z = [ 2 ,], v = [«;,] and the matnx 
n = [tt,!,]. The covariance matncea of the sets of variables y, z, and v will be 
denoted by , and A, respectively. 

In contrast with the preceding example, the variables y are the only observed 
variables. The variables v and z are latent variables. 

Our model will be given by the following specifications: 

(4 1) y' = W + v'. 

(4.2) E{z) =0 and E(v) = 0. 

(4 3) The set of variables z is stochastically independent of the set of variables v. 

See Hurwicz [11]. 

Proofs of the statements in this section will be found in a separate paper by one of 
the nuthois (Reiersdl [10]). It should be noted that the notation is di^erent in the two 
papers In the separate paper the notation is close to that of Thurstone. In the present 
paper the notation has been chosen to coirespond in some way to the notation in the econo¬ 
metric example A list of corresponding symbols in the present paper a#d in Thurstone’a 
books follows. 

Present paper y, zi, ir.t 0 p Myy M,, A 

Thurstone s, x„ a,„ n r Ri Rpq Ri—R 

It should be noted that 1 and A are covariance matrices of the original variables, 

while Ri , Rpq , and R are covaiiance matrices of standardized variables. 



176 


T. C. KOOPMANS AN1> r». UKIBMP'I, 


(4.4) A is diagonal and different from 0. 

(4.5) The elements of z and v are jointly normally distributed. 

(4.6) Each yi is correlated Tvith at least one of the other i/’s. 

(4.7) The rank of II equals the number p of its columriK. 

(4.8) Ml, is nonsmgular. 

(4.9) pisthesmahestnumberof variables 2 whirli i.s compatible with the joint 
probabilitydastribution of the observed variable.*; y and 8 ppeiru',ali(m.s (4.1)- 
(4.8). 

(4.10) Each column of H contains at least p zeros (in un.specifird places). 

(4.11) A. normahzation rule fixing the units of the variables x and a nile tixing 
the order of the columns of 11 . 


Denote by IIii the matrix consisting of all the row.s of TI which have, a zero in 
the fc*'' column. Let the number of rows in the matrix IL Im pt. Let IL,, denote 
the submatrix of n* which we get when deleting the f*’’ row of 11 * . Uhing the,so 
notations we shall formulate the final specification of our mmlel. 

(4.12) The rank of each of the matrices IT*, (fc = 1, 2 , • ■ • , p; r . 1, 2, • • • , p*) 
is p - 1 . 

Specification (4.1) represents the structural relationships 
Specification (4.10) means that the experimenter thinks he can coiuslnict a 
sufficient number of tests where at least one of the c.ornnion jirimary fac.tora i.s 
absent. 

We shall first consider a model © containing Spccification.s (4.1)-( 4 .9) only. 
From (4 9) follows that p is uniformly identifiable. 

Let Po = |(2G + 1 — V8(r + l). If p > pg , the matrix A is generally not 
identifiable If p < po, A generally is identifiable When p ~ Po , the number 
of values of A, which correspond to a given covariance matrix My,,, is u,sually 
fimte, and may be equal to one or greater than one. The matrices 11 and M,. 
are never identifiable in the model ©. If A i.s idcntifiablp, the set of all .stuic- 

tures (n*, M„ , A) equivalent to the structure (n, M„ , A} is given In- the .set 
of all matrices 

(4-13) H* — ijsf? 

and 


where 4; is any square, p-rowed and nonsmgular matrix. 

In the following we shall confine our discussion to the case p < po, and to 
structures m which the matrix My, is such that A is identifiable in ©. 



IDENTIFICATION OP STETICTURE 


177 


We shall now consider the model & defined by Specifications (4.1)-(4.11). 
In this model a necessary and sufficient condition for the identifiability of R is 
that any square p-rowed minor of n which is of rank p — 1 is contained in one of 
the matrices II*. This condition excludes the possibility that all elements be¬ 
longing to the intersection of p — 1 rows and two columns of 11 are all equal to 
zero. In order to be able to use this result, the experimenter would have to be 
able to construct tests where one, but not more than one, common factor would 
be absent. Therefore the result is not particularly useful. In order not to exclude 
the case where two common factors occur in more than p — 2 tests, we have in¬ 
troduced Specification (4.12). 

We shall finally consider the model &' defined by Specifications (4.1)-(4.12). 
Assuming known, we can determine some value 11 * of 11 which satisfies Speci¬ 
fications (4.1)-(4.9). Since, by assumption, A is identifiable in ©, II* must be 
of the form 114^, where n is the true factor loadings matrix and is non-singular. 
Let Ilfc be a submatrix of IT* contaimng all the columns of n* and satisfying the 
following conditions 

(4.15) The ranlr of IT* is p — 1. 

(4.16) The addition to n* of a row contained in H* but not in 11* increases 
the rank to p. 

(4 17) Each submatrix of H* obtained by deleting one row of n* has rank p — 1. 

A necessary and sufficient condition for the identifiability of 11 in the com¬ 
plete model ©" is that there exist exactly p submatrices 11 * of H* which satisfy 
conditions (4.15)-(4.17), and that the p vectors 5 *, satisfying the equations 
n*gft = 0 when k = 1 , 2 , • ■ • , p, are linearly independent. 

It should be noted that Specifications (4.10) and (4.12) are observationally re¬ 
strictive, i.e., they are in principle subject to statistical test. 

6 . A comparative discussion of the examples given. Some comparative re¬ 
marks on the three examples given in sections 1.2, 3 and 4 may illustrate our 
general discussion of the identification problem, given in section 2 . 

In each of the three examples considered, the model contains a general speci¬ 
fication prescribing a parametric form of the structural relationships ( 2 . 2 ). 
Further particular specifications therefore take the form of parameter specifica¬ 
tions in the function u) in ( 2 . 2 ) and possibly m the distribution function 
(2.1) of latent variables. A comparison of the three examples shows a striking 
formal similarity of the identification problems to which they give rise. This 
similarity justifies our speaking of identification problems as a separate group 
of problems preparatory to statistical inference, of quite widespread occurrence. 
The same definitions of structure, model, parameter, identifiability are applicable 
and useful in each example In all three cases, parameters occur, the identifiability 
of which depends on other identifiable structural characteristics (the normality 
of a distribution function in one case, the ranlcs of parameter matrices in the 
other two cases). 



178 


T r. K(ii)i‘MVNiJ; \\tt n, 


Oiii remaimTiR TOmiirkH will la* diawn fniin thr and fH<‘i(nr Huulyt i,- 

examples only, partly heeaUM' 11 i<‘m* ilhi,'trait* tin* idiaiJjliirtiiMu jir.ililim in 
greater elaboration, partly heeaust* the eltiwr f-iinilarifynf ihi.'ffVtniplt'itjM'iiiiitK 
us to notice interesting (liiTerences in great**! dflail. 

Let us consider the iiarlinilar caH* nf tin* t'ciiiittiufirit* i xfitiiplt* vdifti flieri* are 
no time lags between the j/’h in the Mruefnial lelaliun.-bipf* 11 *, when r. tt, 
In this case the reduced/orm m (he eetin'imelrie evimi)]!* i**, nf 

foim as equation (4,1), which th'fme.s (lit* .strncliiral rchilitiu-hi[i ^ jn the faetnr 
analysis example The notation in the factor aiiiilyMs cMriiiph* lut- hccji clniMcn 
with this similarity in mind. Eowi'ver, i( sliouhl Ik* einj»li;!''i?,>*!i flcil, wliile iIm* 
variables y are ob.scrved m both exainple.s and (he vannltlc-, i* an* lali'nt in Imth 
examples, the variables a are observf*!! in the (*cnnoini‘lric <‘\ainpl<* and latent in 
the factor analysis example, and even the mnnlier of variables r i.s an unknown 
parametei p in the latter example. For this reason, (he di.‘.cu.‘s*-iou of (he idetpifi- 
ability of A in factor analysis has no eouutr'rparl in the econonu'tric imHlel. 
Furthermore, the identifiability of the matrix II, whicli i.s antomatn* and uniform 
in the econometric model Sc, say, iwiuire.s detailed siieeiliealioii.s in tin* factor 
analysis model S/, say, including the. diagonalUy of A aiui pie'criptioiis about 
the number of zero elements m each cohiinn. 

The observability of z in the eeniiomi'lnc. ease i.s exjiloited to tnmlul!!te, Imliind 
the reduced form (3 12), a structure {F(w). A} to la* identified (when* possible) 
from further specifications based on eeononiic theory. Here we m(‘r‘t with aiiotlier 
analogy, with differences, between the, identilieation iiroblem of \ in 2, ainl 
that of n (given A) in ©/ . In the latter problem, the set of inatrh'e.s IF, lielong- 
ing to a set of equivalent 8tmcturc.s, is given by equation (l.i;)). Tlii.s ei[uation 
is analogous to the first of the equations (3.7) in the cconoinetrie ease, with 11 in 
©/ now corresponding to M in <©«. 

If we were to specify zeros in assigned places in the factor loadings matrix 11, 
and to introduce a normalization mie for each eohiimi of II, the results tpiofeti 
in the econometnc example would immediately bo, applicable, to tin* fa(>f or analy.si.s 
case. A necessary condition for the identitiability of II, given tliat of A, would lx* 
that the number of specified zeros in each column of H be at least p - 1. Neee.s.sary 
and sufficient for identifiabihty would be that the matrix conhisling tif all r^iws of 
n which have specified zeros m the /c*"* column, be of the rank p -- 1, for each 
value Oik, 


However, instead of specifying that given cleraenffi of 11 he emud to zero 
Thurstone assuines that we know that there is a certain minimum mmil.cr of 
zeros m each column, but that wo do not know which jmrticular element.^ art* 
zero. The specification of a certain number of zeros in undcsignatod placc.s oh- 
viously represents a weaker assumption than the specification of the same number 
of p-1 zero^ffiTT^- ^ f ^ tli'^i^efore not surprising tliat the specification 
abilitv ofTTr^ T never .sufficient for identifi' 

(4 inT w u ’ “ introduced the stronger specification 

(4.10). We have seen that even this specification is too weak to be prac ically 



IDENTIFICATION OF STRUCTURE 


179 


useful, and have introduced the additional Specification (4.12), which makes 
the factor analysis model still more different from the econometric model. 

Continuing the analogy in which a' in ©« corresponds to 11 in ©/, we note an 
important feature common to both examples, and present in other situations as 
well Even if .specufications sufficient, in number and variety of “points of ap¬ 
plication,” for the idcntifiability of all structural parameters cannot be derived 
from a priori considerations, it remains possible to constmctunifonnly identifiable 
functions of these parameters, knowledge of which constitutes scientific informa¬ 
tion of more limited usefulness. 

In the econometric example we have already seen that for certain purposes a 
knowledge of the uniformly identifiable matrix n of the reduced form is sufficient, 
while for other purposes we need to know the matrix A. As a further illustration, 
suppose that we want to test for persistence of the structure by comparing the 
equation systems which we estimate from data for two different periods, Dis¬ 
regarding errors of estimation (which are not our present topic), if A is the same 
in both cases, n will also be the same in both cases. It is therefore possible to 
arrive at a rejection of the persistence hypothesis by determining II in both cases. 
Suppose next that one row (or several rows) of A are different in the two periods, 
while the other rows of A are identical in the two cases. If Bo changes from one 
period to the other, we may expect each element of B. to change. If we can de¬ 
termine A for each period, the equality (as between periods) of some of the rows 
of A will indicate precisely the extent of validity of the persistence hypothesis. 
If we cannot determine A but only n in each case, this venfication will be lost. 

Similarly, it may in factor analysis be sufficient for some purposes to consider 


what we may call the reduced form of 11. Let Hr be the upper square part of II 
which we shall assume to be nonsingular. The matrix A = n 117* will be called 


the reduced form of E. It will be of the form 



. A is always identifiable when 


A is identifiable. 


Suppose now that the same battery of tests is given to two different popula¬ 
tions. Suppose that some of the factor loadings are different in the two popula¬ 
tions, while other factor loadings are the same. If at least one of the different 
factor loadings occurs in the matrix Ei, then each element of An may be ex¬ 
pected to change, and the partial identity of the two structures cannot be dis¬ 
covered if we determine A only and not E. On the other hand, if E is the same in 
both cases, also A will bo the same in both populations. 

Let us next consider two different batteries given to the same population. 
We shall suppose that the two batteries have some tests in common. For each test 
which is common to the two bgttenes we ought to find the same factor loadings 
in both batteries. In other words, the matrices E in the two cases ought to be 
partly identical. On the other hand, if Ej contains rows corresponding to tests 
which are not common to the two batteries, the matrices An will be entirely 
different in the two cases. Therefore, again, identification of E will be necessary 
to verify the equality of the factor loadings of tests common to both batteries. 



180 


T. C. KOOPMVNti AND C). KBIBHS0I/ 


A final remark relates to observationally n'strictive sfjceifiratious. Particu¬ 
larly where the model is to a large degree bp<>nilafive, empirical (’onfirmHtion of 
the validity or usefulness of the model is ohtaiiK'd onl 5 ' to the extent that oh- 
aervationally restrictive .spccificationa are upheld by the data. Tliu.^, 'I’liurstune 
emphasizes that the number of factors p should be. well below the value p,, found 
above to be necessary m general for the idcntifuduhty of A, htdore a fartfir analy¬ 
sis can be regarded as successful (Thurstone [22], j). 2!)1). 

In econometric work, greater reliance is Hornetinies plaewl on a prion HiK'cilica- 
tion of the form of a behavior equation, particularly the variables ueeurring 
in it. If the linear restrictions on an equation in a linear system are just sufficient 
for its identifiability, estimation of the parameters of that eipiation i.s po.Hsihle, 
but none of the identifying restrictions are themseh'cs subject to test. Again, 
dependence on a pnon information is diminished (but not eliminated) to the 
extent that a greater number of overidentifying restrictions are imposed and are 
upheld hy the data. 


[1] R. 

[2] R. 

[3] R. 

[4] R. 

[5] R. 

[8] R 

[7] R, 

[8] R, 

[9] T. 

[10] L 

[11] L. 

[12] L. 

[13] T, 
[141 T. 
[15] T 


REFERENCES 

A. Fibheb, “On the mathematical fouadaiiona of theoretical Btatiatioa,*' Phil, 
Trans. Roy Soc., London, Ser. A, Vol. 222 (1922), p. 309. 

A. Fisher, The Design of Experimenls, Oliver and Boyd, Edinburgh and London, 
1936 ’ 

Frisch, “Correlation and scatter in ststiatical variables," Nordic, Plat. Jour., 
Vol. 1 (1928), p, 36, 

Fhtsch and B. D Mhdqett, “Statistical correlation and the theory of cluster 
types," Jour. Am. Slal. Assn., Vol. 26 (1931), p. 376. 

Frisch, “Pitfalls in the statistical construction of demand and supply curves," 
Verdffentlichungen der Frankfurter Ges. fur Kongunklurforschung, Neue Folgo, 
Heft 6, Leipzig 1936. 

Frisch, Statistical Confluence Analysis by Means of Complete Regression Systems, 
Publ. No, 6, Universitetets pkonomiske Institutt, Oslo, 1934 
I Frisch, Statistical Versus Theoretical Relations in Economic Macrodynamics, mimeo¬ 
graphed document for League of Nations conference, 1938. 

' “Inherent relations between random variables,” Proc. Roy Irish Acad., 

Vol, 47 (1942), Sect A, No. 6. > s . 

probability approach in econometrics,” Economelrica, Vol. 12 

(1944), Suppl. 

Hurwicz, “Generalization of the concept of identification,” Slalislical Inference in 
^namic Economic Models, Cowles Commission Monograph 10, New York, John 
Wiley and Sons, 1960. 

Hurwicz, “Prediction and least-squares," ibid. 

R. Kmin "Economic fluctuations in the United States, 1021-1041," Cowles Com- 
misnon Monograph IJ, New York, John Wiley and Sons, 1060. 

^rialysis of Economic Time Senes, Netherlands 
Economic Institute, Publ No. 20, Haarlem, 1937. 

memTa™\7 SfrSs construction," Econo- 

^vnamirt!!^n^ “Measuring the equation systems of 

ComiT, o • Inference in Dynamic Economic Models, Cowles 

Commission Monograph 10, New York, John Wiley and Sons, 1950. 



IDEM'IITf’ATION OF STRUCTURE 


181 


[16] P F LAZAnsfF.iI), "Hit' logical and mathematical foundation of latent structure 

analyfiiH ” “The interpretation of some Intent structures.” Measurement and pre- 
fhetwu, Vol 4, HtudiPH ill Phj chology of World War II, 1050 

[17] J Maii-sciiak, "Kroiioniic interdepeiideiiee and statistical analysis," Studies in Mathe- 

maliral Ernmwiirs and J-koiiomctrics (1042), University of Chicago Press, 1942, 
])]) i:irr Uio 

[18] A. C. Pk.oT', "A nietliod of determining tlie numerical values of clastioitiea 

of denuiiul,” Ermomir. Jtturnal, Vol. 20 (1910), pp. 636-040 

[19] 0. liwKfiH^r., "On the identifiahility of parameters in Thurstonc’a multiple factor 

analyaiH," Pspchonit'lnka, Vol 15 (1950). 

[20] H b'cHui-Tz, Theory and Measurement of Demand, University of Chicago Press, 1938. 

[21] L L. TiniiiaxoN'E, The Vectors of Mind, University of Chicago Press, 1935 

[22] L L. Thuhstone, Multiple-factor Analysis, University of Chicago Press, 1947. 

[23] J. Tinbehoen, Business cycles in the United States of America, 1919-19SS, League of 

Nations, Geneva, 1939. 

[24] A. Wald, "Note on the identification of economic relations,” Statistical Inference in 

Dynamic Economic Models, Cowles Commission Monograph 10, New York, John 
Wiley and Koris, 1950. 

[25] S WiUGiiT, "The method of path coetBcients,'’ Annals of Math Slat., Vol. 6 (1934), 

p. 161, 

[26] F, Yates, "The design and analysis of factorial exporiments," Imp. Bur. Soil Sei. 

Tech. Comm., No. 35, 1937, 

[27] Q, H. TiiowaoN, "The proof or disproof of the existence of general ability,” Brit. Jour. 

Psych., Vol. 9 (1919), p. 323, 



SOME PROBLEMS IN MINIMAX POINT ESTIMATION 


By ,T. L IldiKii,", \Mt K. E 

Vnmrail!/ <ij ('uliUuitm. lltdih;i 

1. Summary. In tlu; pic>(‘n1, jwprT flic iiniiilcni I'f iHiinf c fnnafinn : imi 
sidered in tenns of lisk luiiclions, wiflunit flic cii-.fiuii.in ic-ini (imi i<i huEki cd 
estimateB It is shown that, whenever the hiS' i- ;< fuse sinn ni fhe tali- 

mate, it snfliccs from the risk vienpoinf in l■lln,••llh■r »Mii(;iiidi«iuireil >' li 
mates. For a number of speeilio iirobh'ins llie niiiiini.r. i iniiilt-, ;ui< fnnnil e.\- 
plicitly, using the squared eiror a.s lo.-s. I'eriaiii 011011 , 1 .piiilietinn jii.iMeois 
are also solved. 


2. Introduction. 'Ihe principle.^ most, eonmumlv .qqi'iii tl ui (he M*!i‘eiioii e) a 
point estimate are the principles of maximum likehlmo'l lU Fwltei j and of 
minimum variance unbiased estimation (Mmkolii." Both of Ihe-e principles me 
intuitively appealing, but neither of them can Ik- ju'-lili(‘«l very well in a .-yt" 
tematic development of statistics. This holds uKo fm . time moililiealiniis of tliiM' 
principles proposed by G. \V. Brown [11, us the anilim himself point.N out 
In an important eaily paper [‘ij, Wiild ludieaU'd a iuom* .'■y.NleniarH’. approach 
to the problem, which ho later developed lulu his geni'ial ilieory of .‘•lati.sfiea! 
decision problems [3, -J, 5]. Con,skier a miidom vaiialile .V ili.-lrilaited over a 
spaceiTaccordingtoa dwtribulion Po willifl eSl . It is de-ned loisfimafn.some 
g{6) If the value z of X is okserved one make.s an e.sliiuate, -ay/l r), and llieii’liy 
incurs a loss of ir[£ 7 (f?),/(x)] when 0 is the true value (d (lie parameler. We .--hail 
assume that the loss function is noiiiK'gative It. then lollou.s Ihal. the espectalion 
of the loss will alway,s exist (althoiigli it may he infniilei 'Die risk as'-ocialed 
with the estimate / i.s defined to lie the expisded 10 .*“^, as Lri\ en l*y 


(21) RjiB) = E, ,//.v (,y 

‘'vf 

The choice of estimate should then be made aceiinlmg to (he ri.sk liinetion, .Vs a 
particular possibility'Wald suggests the u.se of miminax eslimates, i.e, e.sliinate.s 
vfbich minimize siipj R/ (9). 

The mam purpose of the piescnt paper is to obtain nmiinuix estiiiiale.s for a 
number of specific problems. Only few such iiroblems ha\ e Iteiai woiked out so 
far, the emphasis in Wald’s work having been on tlie general theory In I'J] Whdd 
obtained the miniraax estimate of an nuknnwu location parameter. Blem and 
Wald [6] treated th e .sequential problem of cstinuUiug tlu- meuuol a uoi'inal dis- 

*This work was Buppnrlecl in p:ul, liv the Oiripc at Naval llisciiHi, 

^Actually, the principle of minimum variiimm nnhmsed •mnOnm ImCv 0. (hubs. 

Leipz g, 1891 andE L Pl^ckett, A historical uoto on (Im nmilmd of Inasl somues" 
Biometnfca, Yol 30 (1950), p 45S. . 


182 



minimax I'oi.vr estimation 


183 


triliiition wifh kimwii vtuiaiifc, and in Ins forthcoming liook Wald considers 
as an ONamidc flu* M'fiumiial problem of estimating tlic mean of a random variable 
distributed uniformly overall interval of length 1. 

It seems woilhvhile to rniiMder further siieeial problems both becauKo one 
may obtain estimates lhai in some eases are preferalilo to the convonlional ones, 
ami luM'anse (luve e\,'tm]i!ef. Ilnow some light, on the general desirability of the 
niinimav ]irinei]ile wi* -.bail see below, if does not seem possible to reach 
any didimte eoiielu'-ions on this latter jioiiil, and to obtain a gcneially valid com- 
paiison lielveen the nunimax estimate and, for example, the unbiased e.stimate 
with uniformly smalle'.l varianee (wlnm such an estimate exists). 

CoiiMcler, for exam{)le, the piolilem of estimating the probability of succesg from 
a imiubiM' of iiulepeiidenf fi'nd« each of which may be a suece.-^s or a failure, when 
the lo.ss-fuueliori i-. tlie sqiuued enm. Tf the umuber of trials is one, the minimax 
estunatt' (as is .sliovn belowi is given by- IX d- I, where X is 1 or 0 aa 
(he trial !.■’ a ‘-ueeess or failuie As is easily seen, this estimate has .smaller risk 
than the usual e-fmiate i**f.Vt = A” whenever 0 07 ^ p S. 0.03. On the other 
hand, when the nutubci of trials is large the standard e.stimate, X has smaller 
risk than the minimax e.stimate nearly everywhere. Theimnimax p.stimate isonly 
.slightiv bettei in a small interval (entered at p -■ ■), who.se length tond-s to zero 
us (lie numbei of trials tends to iulinity, and is wor.se everywhere else. 

For our pui[iost* it i.s eonvenient to formulate the problem of point estimation 
as follow.s (s(‘(' in (hi.s eomieetiou (7)), A random variable X i.s distributed over a 
space VV aeecinling to a di.-'litbiiliou /’ belonging to u family ff. We wish to esti¬ 
mate (/(/’) when* (I i,-- a fund ion whci.v domain Is IF and who.se range i.s contained 
in .some, space '■.'i fin any e\am(ile Is u-sually a Euclidean .space, mostly even a 
one dimen,sioiitd Eudideaii .space). An e.slmude i.s u statistic J(X) taking on 
value.s in Vi. We deimle by /(t)! the lo.s.s whieli le.sults from making 

the e.stimate/(>t when P is- the true distribution, and wo define the ri.sk function 
of th(> e.stimate f liv 

(2.b) 11,(P) - E,\y[{,{P),jiX)]. 

The problem i.s to deferiiune/.so as to minimize .supc ,5 Rf (P). 

Our iiriiieipal tool will be lh(‘ follmving tlieorem, which is e.s.sentially contained 
m Walds work but rvlueh i.s not .stated tiieie exiilicilly, The tlieorem is a slight 
mndiliealion of one u.sed for the thi'ory of t('.sting in [H], 

Thkouum 11.1 Lil |/'s!, U {u'htrr w m a sidwef of a Euchdmn space), ho a 
pivnmi'tn'r snhfnniiti/ of It, atul Id h hr n probnhtUly mrnsmv over w. Suppose that 
f miniriiR(s 

(T3) f E>Uy(Pe),f(X)]dm 

J gj 

and Ihnl 

(1) A'oTr [f/(Pe), /(A')] I.s constant {say c) for all 0 e oj, 

(li) E,W [(/(/’),/(A')] < rfor all Pin ‘S 
Therif is a minimax estimate for estimating g. 



184 


J. L. HODOKS, JIl. \ND K, h. LKIIMVNN 


Prooi'. Let f be any other estimate of g. Tlieri 

supB|.lP[ff(P),/(X)] = f I‘:eing(Pihf(X)]dm 

i-.B 

(2.4) ^ f PJi'[(/(Pf).r(A')]fA(0) 

J M 

^ supPpIF[|7(P),r(A')!. 

r«ff 

We note that if / is the unique function mininuwng (2.3), then the first in¬ 
equality in (2.4) becomes strict, and hence / is the unique ininimax estimate of g. 

Following Wald we shall call the function / tliat. minimiz(.'s (2.3) the Hayes 
estimate of g associated with the a priori distribution X. As a corollary to theorem 
2.1, we note that a Bayes estimate whose risk function is con.stant, is a niiniiriax 
estimate. 

3. Randomization. In the formulation of the problem of point estimation given 
above, the estimate f(x) is assumed to be completely ilcdormined by the obsprvwl 
value X of the random variable X. In the present section a liroader formulation 
of the problem will be considered, in which the cblimate corresponding to x may 
itself be a random variable, say Ti. This extension is a special case of the notion 
of randomized decision function introduced by Wald in'his general decision 
theory. We associate rvith each x int’X) a probability distribution l''i, with the 
convention that when X is observed to have the value x, wc e.stimate g(P) by 
means of a random variable T* which is distributed according to F, Kstimates of 
this latter kind we shall call randomized, and the fixed e.stimatcs /(,r) 
nonrandomized 

The motivation behind the admission of randomized estimate-s (or more gmi- 
erally of randomized statistical decision funtions) i.s that in some prublera.s of 
statistical inference the performance of the decision function is considerably im¬ 
proved by randomization. It is clear however that the randomized functions are 
more complicated, and hence that it is useful to know when their considerariun 
is not necessary. Before investigating this question we give the following defini¬ 
tion, which makes precise a sense in which certain estimates maybe omitted from 
consideration. (See Wald [9]). 

DEriNiTioN For a given estimation problem a class C of estimales will be 
said to be essentially complete with respect to a class D of estimates, if for every 
estimate ginD there exists an estimate / in C such that Rf{P) ^ for all 

P in (F. If' D is the class of all randomized estimates wc simply say that C i.s 
essentially complete for the given problem. 

It is clear that if one adopts the lisk function point of view, one lo.ses nothing 
by restricting consideration to an essentially complete class of estimates. In the 
present section we find conditions under which the totality of nonrandomized 
estimates forms an essentially complete class. 



MINIMAX I'OINT ESTIMATION 


185 


For this purpose wo neofl Iho notion of convexity. A set S in a t-diraenaional 
Encli(i('An .siiaoo i.s .‘said to We convex if, vdienever P and Q are in S, then all 
points on the lino sogincnt from P to Q are also in iS', A real valued function f 
defined over a A-.dimensional Euclidean space i.s said to be convex, if for any 
points (xi, • • • , J"!) anti fi/j, • ■ ■ , y*) of the space, and any number 0 < « < 1 
we have 


cfA(Ji I • • ■ 1 3r*) 

(3,1) 


(1 a)'P{yt , • ‘ , yik) ^ 

iictXi + (1 - u)yi, ■■ ■ , axt + (1 


a)y*). 


We use. the following notation for conditional expectation. If U and V are 
two random variables which have a joint distribution, then EiU\v) denotes the 
conditional (‘xpectation of U given that V = v, B{U\if^) denotes the conditional 
expectation of U given that V isin »S'. lA!t$(e) - E{U\v), thenfor^CF) we write 
KiV i F) 

Le.mma 8.1. Lri f/, V hr two random variables wiQi a joint distribution, such that 
U is dislrihuled in a k-dimcnsional space and E{U) is finite. Lei 4>be a real-valued 
convex Jundion dvfimul aver this space and bounded from below. Them 

A’CAliiXf/lF))} g Em)]. 

PuociF. 'I’he, jiroof is immediate in the apecial case that, for almost all v, there 
exihf.s a clctomiinatiun of the conditional probability distribution of U given v 
which is a mcaaiin'. We then know, from the convexity of that for almost all 
values I) of V,\p{E((I i u) 1 g E{^{U) |«! . Ueplacing u by F and taking expccta- 
tion.s of both sides, we obtain the desired result. 

If we do not assume the existence of conditional measures, the proof is more 
compheafed. Since E(U) is Unite, (here exists a function E(U | v) such that for 
any set E, E{U j E) ~ E[E[U | V) | ri'}; see [10], p. 47. Since i is convex it is 
measurable, and since ^ i.s boundeil from below E{\I>{U)} exists. Excluding the 
trivial ease E\\p((r)] — a, we know there exists a function jB{^( 17) \v} such 
that for any set E, | ,S'J = /!j(^(^(i7) | V} | ri). 

If the lemma wen false, we sliould have/?(i?|^(l7) | F]j < E[f{E(U\ F))], 
and eoiild find an e > 0 and a .set A of positive F measure such that for every 
V e A, E\^{U) \ I'l -f- It < ^\E{U i 7')1- This implies the existence of a number d 
and a .set, P of positivf' I' nii’asurn Midi that for every v e B, EiiyiU) |ii) g d 
and (I t SilE(l^v)]. Since ^ i.s convex, the domain D of points P for which 
iiP) < el be IS convex, uiul we may find a subset C of B, of positive F measure, 
for wbieli llui set of points E(.U | v), v « (I, lies in a convex domain E disjoint of D, 
It follows that 77(1' [ C) lies in E, and hence th&b ^|e[E(U [ C)) ^ d + e. Clearly 
d A'lV'(f/) 1 C]. Tims wp liavc the contradiction E{HU) | C) > i^lEiU [ C)]. 

Definition A lo&s function W will be called convex if for every u TF(w, v) 
is a convex function of the estimate v. 

An example of a coni’ex los.s function is provided by tlie Markoff principle of 
estimation. The variance of an unbiased estimate may be considered as a risk 



J. h. IIOIICJI JK. IMi 1. I.. I LU't I’i’»■ 


function if ^\t take tlic lit,-- fiiiifti-ui l<t kf >ls'’ ■ 'pi n- •! nr-«r, i r «!)■* r-ipinro nf 
the difference liclwecn (fie true value /;-/*' and the • ‘‘nuated -vjIum f’/. ftj' 1 \ ; 
and Una loss fimcliun is clearly cmiivi'X. 

Tiieobem ‘,1.2. If Ihr Um ftinrlum 15' ?. nm - i/ "< i .>i n L 'yi.ri. 

and if wc consider onbj rsluiutlf s ImrsH;} Jin}ti . }l>u fi. ,' m’ (,!««- 

randomized csiiinaics is cHXiiiltnlbf lamjifi*'. 

Proof. Tjot Tx he any riuiduiiiized e liniaf* .mli tit it l\ imI mid i 
finite. Applying leniina !1.1 we m'i* (fiat KiTt, .Vf, '«vhi»!( :* ,t jint> 't'>n e) muIv 
is a nonrandomizod oslinuUe, lia.s a nek iie\ci t'rr .i'fi- (h.tji ili ■.! xl 'J\ 

The restriction in. thcoiem ff.2 (u eMiiuiPt'S liaruit; hiu’e i ■^je < laSi'in may !h' 
replaced by the rcrpiiremcnt that fur eaeli iheie evi i a liuiiihei U, 4|eh 

that if I y — It 1 = d/„ then Il’fn, r) > Kin, n). Wifh (hi,-, ifpiurmi'ii* and the 
convexity us.sumption, it [olhmslhat the rii-k laf' d v. i(!i ’I\ nmliniie when¬ 
ever E{Tx) is inlinite. 

Theorem ,3 2 i.s related to a frcnerali/ation of .i (heiticm i,t' lUaekweSl. If }' i-. a 
.sulfacient .statrstie for f/(i'), nnd if fur alnne.l all >/ fli<* i Miiduiuiial di-iinhii(i<ai nf 
X given y exists in the. sense of lueasure, we may legaid e .titaaiuui of pi /' i Ici eil 
on X as randoiniml e.sLmuilion of ijil’) liu.-.ed nii Y. and is (he a-'Mimptinn, of 
theorem 3,1 are suli.sfied, we iiuiy atiply llii-s Iheuieiu (o riinehnle (he i-,‘eiifi'd 
completone.ss of the class of iii)nraiidonii/.ed e.Uiinali's hu'-eii on In the netieial 
case we may resort again to lennna. 3.1 (u prove (he following (le'orem, (he jnoof 
is the .same us that of thcorcan 3 '2 if A' is ri'iilared hy 5' throughout. 

Theorem 3.3. If Ihc, loss function IP isruuvr.f, if\'i is in n Kiu lnltan i fiiiri, if 
we consider only cslimalcs haviiuj a Jinitc niirrlnlion,and if Y is a snjllrn n( '.Inlidic 
for IF, then the class of nonrandumkrd cslimnlrs v'liich an JnnHitms nf 1' nnly is 
essentially complete 

Blackwell [11] proved that if U is a .snflieient. stali.slie for a real-vuliieil paruiii- 
eter 0, and if T is an unbiased estimate for 0, (hen A'(7' > f’) , wlueh is a fnnetiou of 
U only and also an unbiased e.slimate for 0, ha.s a viiriailee whieli never l•xeeed,s 
that of T Observing that the tliooreuis above liold true wlimi wi' re.stnet, at f etif ion 
to unbiased estimates, Blackwell’s le-sult may be, obtained from fheonun 3,3 by 
letting ?i be one-dimensional, letting IV' be the sfpuireil f'rmr, and re.strietiiig 
ourselves to unbiased estimates. In a similar manner we ean get from llieoreiii ;{.3 
an extension of Blackwell’s theorem given by Barankiu [12], wlio treated the 

case m which 17(0, j) = | q _ j |'^ clear that these loss ruiietions are 

convex. 

If the convexity a,ssumption is removed, tlieorcun.s 3.2 and 3.3 cease to lie 
true. Por example, if IT has only n iioinks, if ‘-.'t is a (iuite line segment, of length 
greaterthan2na, and if the loss is 0 whenever] ,,{P) ~ f(x) | < and 1 otherwise, 
then the minimax risk among nonrandomized estimates is 1. By admilling ran¬ 
domization, however, the maximum risk can he brought below 1 willnmt n.siug A 
at all, 1 our estimate T is imiforinly distributed over S'/, then the maximum risk 
will be ] - a/(length of S'() 

The example ]ust given may seem inappropriate, in that with the specilied loss 



MINIM\X POINT estimation 


187 


function )lu- [iroi.lnni would ou.stom:irily he considered one of interval estimation 
rather than luiiiif eMinuLtn.n. d'his objection doesnot apply however to the loss 
function-, con-sideicd in the folloM'ing theorem. 

I m.iiKi M .1 1 ./.(/' f - 10,1, " , >i},n > }. Lei ihc set ofhino7niald%s- 


hihiiliinis dHimd htj 


J\,{X - .t) Qp*(i _ p)"--, 0 < p < 1 , Lei 


‘t'i he the 


rlosnl iuUmil ( 0 , 1 ] amhjiP^) p. Let W(p, t) = \ p ~ l\’, Q < s < 1. Thermo 
ntinimit r iwlniiah' rfin hr nonrarulmmzcd, and the class of nonrandornized eshmates 
is not rssi ntialhi rmnplrlr. 

I’H-ii.r. For any nonrandornized ehtimalo/, R,{p), being a sum of products of 
continuous functions .if p, is itself a contiuuous function of p. The nonrandornized 
riiininia\ ri.sk h less than 1, lus may he shown by considering any estimate of the 
following kind:.ttn) -- 0,/(n) i, and 0 < f(x) < 1 for all a Here J?/(0) = 
h‘,\ \ I d, while if 11 < p 1, I{j{p) < maw j p — f{x) j ’ < 1. By continuity 
sniiM- ,..,1 h'dp) '■ 1 

It IS easy In s(>c that (lu'rni’xists among I he nonrandornized estimates a minim ax 
estuniile, y.iVli. Let the coriTspnndmg minimax risk he denoted by ilf. We know 
that If - Miiiii.,,, .1 /u(/d •- 1, It i.s obvioiKs that M > 0. Obsoive that !i{0) < 

1, since /i(l)) , ■ I leads to the eoniradie.lion 75.(0) = | h(0) f > 1. We can write 


h(j- ) 


i) - Ip ~ h(x) 


l'+ E /%(X = .r) • |p-;i(.T)r. 


Tlie second .sum has a liiiile derivative with lespcct to p at p ~ h(0), while the 
liisi sum increases with infinite speed asp is moved away from /t( 0 ). Therefore 
75,l/i(())l < M\ and hy an exactly .symmetrical argument, 0 < h{n) and 
7iM/i{n 11 < M. T'sing (he eontiimity of 75., we ran find a positive number u so 
.small (hat 75(/i) < M whenever | p — /t(0) | < w or | 7 ; — h{n) | < co. 

C'oii.s-ider now the randomized estimate T* defined by 3 * = h{x) ii 0 < x < n, 
and by Ti 7/(.r) | aV otherwi.se, whore Y is a random variable independent of 
X and taking on the value.s i and —I eai'h xvith probability -J-, and where 0 < 
a < w. Observe 


Rr, (/)) - kMp) - (I ~ 70"[i i I p - /i( 0 ) - 1 - «1 * + 1 p - 7t(0) - c. I ') - I P - 
1 1 i- p"[)|| P - h{n) -b « r f I P - Kn) ~ a\‘] - \ p- h{n) \ ‘]. 


By the coneavily of the fimeluiii.s involved, the first square braoketted term is 
negntii-e wheiieier | p ■ /t(()) | > a, and the second is negative whenever 

1 p - h(ii) I a. \V(' can choo-se « so small that whenever cither | p — /i( 0 ) | 
or I p ■■ h(ii) j is less than «, Rryiv) ~ Rdp) < A continuity argument 
now shows tluil. su] 1 o 5 j,..-,i Rvyip) < -17. But (his proves that no minimax esti¬ 
mate, with iiindonuzation iicrmiUed, can bo nonrandornized. It is also now 
obvious that (lie class of nonrandornized cslnnales is not essentially complete: 
o\ cry iionraiidomized estimate iniisl have a risk function which somewhere cx- 
ceed.s sup,,. Rrxiv)- 



188 


J. I,. HODGM, JTR. A.N'1) K. I.. l.I HMAN.N 


4, General properties of minimax estimation. a piiuciplf hurli a.'( the 

minimax principle is a dc-sirable one hii.s to be tleci'lc'l uiuiiily on t’.'io I'riforia: 

(i) its general properties, and 

(ii) its performance in many particular iirsfancr'.*;. 

It has already been remarked that m the m'ihuuI ri'>jpfT't fho mimmax principle 
does not seem entirely satisfactory. With regard to tin* former, onr* great ad¬ 
vantage of this principle is that when there i.su unitpii' minimax e'^timati*, it in 
admissible. Here an estimate/ is said to he admi'-sible (see [H|| if tljere exists no 
other estimate/* such that Rf{P) < R/(P) for all /' in U with siriet inequality 
holding for some P. It is interesting that, as we .shall .show Isdow, this udiuissi- 
bility property is not shared by either the jirineiple of uuhias<‘duess or the maxi¬ 
mum likelihood principle. 

In this connection we begin by proving another theorem roneerning eKsentially 
complete classes. 

Theorem 4.1. Suppose that the space is a finite inirnal [o, t;) on the real line, 
and that for each u « ‘il, W(u, u) is a non-decreasing fune-tion of e when e > u 
and a non-increamig function of v when v < u. Thai the class of eslimaies ivlmr. 
range is contained in ‘il is esaeniially complete with respect to the class of all red 
valued estimates. 

Proof. If T is any real-valued estimate, define !F* by 

(T if 

(4.1) r* = a if T < a, 

[b if f > &. 


It is clear that Rt.(JP) < Rt{P) for every P e (F. 

Halmoa [7] has provided an example in which the minimiira variance unbiased 
estimate takes on, with positive probability, values outside the range of the 
parameter, It can be shown from the proof of theorem 4.1 that in this ease any 
unbiased estimate is inadmissible, provided the loss function is of the kind 
described in theorem 4.1. 

That the maximum likelihood principle may also lead to inadmissible esti¬ 
mates is easy to show, since this is the case in many familiar situations. The 
tollowmg example may be of interest in that here the maximum likelihood 

estimate is uniformly worst among all estimates which one would consider 
using. 


Examph Let Z be a random variable with only 0 and J as possible value.s, ami 

t ri J ] ~ i < P < i Tlicn the maximum 

kehhood estimate for p is easily seen to be MX H- 1), and, if the Uw.s function 
thesquaied erior, the associated risk function is Up - §)“ + -gV. This ri.sk 

J < kS7< “ 1 “!,™'- ;Jo^' 

function in any problem should in theory be governed by 

ems dTnorf “ fact the circumstances of statistical prob¬ 

lems do not usually offer compelling reasons for using one loss function rather 



MIMMAX POINT ESTIMATION 


189 


than anotlipr. Coitsnh'ratiuiiK of mathematical facility aic often determining. 
Tims, varifiuK clai'SH'.'il nnhiased pHtimato.s become minimax estimates when the 
Io.‘'.s function i.s jndicionsly cho.M-n. For, if wc take as loss function the ratio of 
.sfimircrl emir to the variance of I lie nuhia.sed estimate, the risk becomes constant, 
and can (>asily ohtmn the chissical estimates as mimmax estimates in the 
familiar binomial, Ihiisson, and rectangular problems, and in some of the non- 
paramet.rie jirohh'ms coii.''idt*r('d in section 6. 

However, this aiiproach scem-s to he .somcAvhat artificial, and hereafter we 
sliall restrict ourselves to a single loss function, namely the squared error. There 
ate two reasons for this elmicc. With stiuared error for the loss, the mathematical 
problems are. rather simjile. And as was remarked above, squared error (if one 
restricts onc.se]f to unbia.scd (‘stimate.s) is the traditional loss function. Fortun¬ 
ately, the squared error loss function is convex, and hence theorem 3.2 permits 
IKS to avoid considering randomized estimates. 

Wlicn the loss function is siiuared error, we have the following obvious linearity 
property, which for lat<‘r reh-rence wc state as 

'I’liKOUBM ‘t.2. If f(X) is die minimax cslmate for (j{P), then af(X) -f- b is the 
ininimax eslimnie for a ■ g(P) f h. 

However, as we shall .show by an example in the next section, it need not be 
true, that if Xj , • • • , .Yn are independent and/,(Y,) is the miniraax estimate for 
(^P,), i *■= 1, , n, (hen ST-i a,/,(Y,) is the mimmax estimate for 

zZi"-! a,g,(P,). 'I'hiH is a definite disadvantage of the minimax principle as 
compared wilh tlie Markoff jirinciple which docs possess the linearity property 
mentioned. 

We. conclude, this section with an explicit solution of the Bayes problem in the 
.squared error case. If the, distribution P is itself a random variable distributed 
over If according to some di.stribution. X, we may compare estimates / by means 
of their expected bus Q(f) = JSh(P) - f(X))\ Since Q(f) = F{F[y(P) - /(Y)]" | 
Y), it is well known that Q(f) is minimized by using the estimate 
/(.t) == I x], provided the conditional measures exist. In fact, this result 

holds even without this lus-sumption. 

Theorem 4.3. PI(j(P) - /(Y)f is minimized by f{x) = Elg{P) | x]. 

Proof. ElgiP) ~ fiX)f - B[g(P) ~ E[giP)\X]}^ E{E[g{P)\X]-f{X)]^ 

-b 2E[E[g{P) » Elg{P) 1 Yl) [E[giP) \ YJ - /(Y)) | Y] ^ 0. 

In applications it is convenient to write E[g{P) ) Y] more explicitly. Suppose 
that witli respect to some measure m over 9C, each distribution P t Jhasa general¬ 
ized probabiUty density pp , so that for any A, the probability that X t A com¬ 
puted for P, is given by 

[ pp(x) dfiix) . 



190 


J, I,. lIMIJtJJ JJ! '‘‘Sf » I 


MiiiimizuiK a ()iii«lnitic f\iu< J!;-’ 


(4.2) 


iH a Bayc'.s .solution. 


f -fX'/’ 


6. Binomial and hypergeonietric distributions. In (In' pi** ' n‘ ' - snui ac -'(.dl 
consider three diacrctc minimax {)rol»lcin'n. 

Problem; 1. (Binomial.) I.f'LX he a liiiinmial i.iudorn p 

p, 0 < p < 1, so that P(X = k) - ■ //'’ W(> Imll -Iimw Jlrd fin* 

minimax estimate for p is 


(5.1) 


X ^ Vrt j 1 
n (Vn -I- l) 2{\ 71 !■ 1 ■ 


Consider any linear c.stimale aX -1- fh 'I’he i ir-k *(».V Id p - i,< a »iua«ti ;d it- 

\ ?i U i \ n’ 


function of p which is constantly (Hiual to ii when m 
1 


^ - 2 (r+ v^) 

seen that 


Hence (5.1) is acnn-slanl risk cslimaln of /». il iniMMly 


fp-pV~'‘jrV-^d!7 

*[0_ 

7> q ' P q dp 

0 


a 1 - /.* 

a -h h 1- 71 ’ 


((j 1 /il, 


it follows that (5.1) is the Bayes estimate wheti p is dlslrilmied with pnihahility 
density C(pg)^'^"^“’~\ and hence by Theorem 2,1 we ('imehule limt 15.1) is the 
rainimax estimate of p. 

After obtaining this result we were informed that it had bc‘on ubt.'iincci carliiT 
by H. Rubin, to whom, therefore, the priority belongs. 

It is interesting to compare the risk of the above e.slirnale with tluit of the 
standard unbiased estimate X/n. Wc have 





As is easily seen, ^ if and only if 

n — - > a /1 + ^Vn 
2 2(1 



MIMMAX I-OINT I'STIMATIOX 


191 


"J'liuh (he stimdaitl j'stiiiiaii' is boHor lliau llio minimax ('stimate outside an 
interval umund fi - 1 whose Icnpfh clecroiuscs with increasing n, tending to 0 as 
71 tends to intinil y. However, for very small values of u the miniinax estimate has 
file siindler ri4v over nearly the whole range. 

ruoni.t.M 2, (Ditieienee of hmomials,) Let X and Y he independent binomial 


l)l>\ (1 - lb)""* and P{Y = 0 


rundoni variabli's, where/VA" - k) 

7)s(l ~ P.-)" *• Hy use of therirem 2.1 we shall show that the ininimax estimate for 

Pi - tu 


\/2u fX Y\ 

i.s ( ■ )’ (lie aet o) of theorem 2.1 we take = n, 

l-f-\/2nN'i >1/ e s, 


Pi — 1 ” 7>i ^ 7> ^ L and w e let Z ~ X n — Y. Applying the result of 
Piobleru I In Z, we find the minimax estimate of p to be <xi„ ■ Z + )3j„ , and by 
Theori'in t.2 the miniinax estimate based on Z for pi — Pi = 2p ~ 1, is 

, and the lisk of this e.slimate i.s constant over w. 

'I'o prove, that (his is al.'-o the minmiax estimate of pi — p« for the original 
problem, we consider the risk as a fnnetion of pi and pa • It is easy to shorv that 
(l -h N/ila)' /i’(pi, ]hi ■ 2-!;Ji(l “ 7h) + 7b{l - Vi)] + {pi ~ Vif- Fmallyit 
can be ^hown (liai //i(l ~ /'il d- 7b(l — Pi) is niaximized, subject to the condition 
lhai p\ '■ Pa lie constant, wlien pi [- pa - 1. 

IhioHUKM d. {Ilypergeometric.) We finally consider (hc])roblem of estimating 
the iimulier of defeetive.s m a lot from a sample drawn from this lot at random. 
We dmiotf' by A' and n tlie number of eleinmits in lot and sample respectively, 
and by I) and .V (he eouesfionding miniher of defectives. For later reference we 
note 


\'2n fX _ 1 
i -t- V'2n 't 


F(A - A) = 



F(A') 


D 


s liDiN - n){N - D) 

N-\N - T) 

A.s in Problem 1 we etisily lincl a linear fimcliou of X whose risk is constant. 
In fact 

A’«(«A -YU" Df ^ P" 


when 


N 


n + 


. /n{ N - n ) 
y N - i 




(X = 



102 


j. I,. Hom;KH, ju AM) r, t,. n.tnuNV 


To prove that ax' + P Ls (lie miniinax ostiina(<' of /I Wf shall show that it is thp 
Bayes estimate correapondinji; to 

( 5 . 2 ) r{n = rf) = jf‘ //'-*//'"* dp, 

where a, b > 0, and 

n _ 1’^" I" 

' F(«) rt/)i ■ 

In this connection it is useful to notice that since (.".2) is a dislrilmtioii 
^ fN\ r(a + d) V(N + f) - d) rfa + h) 


(5.3) 


<i-o V a / 


--- 1 . 


V{N al>] I’lri) !’(/)> 

Using theorem 4 3, we find the Bayes e-stiniafe ruvsocialril wifli (P.‘2\ to he. 


m 




• d) 

d) 


Replacing d by (d — a) + a, and using the, relation 

(fc)(n-/?)(d) = (d Ifc)- 


we find: 


m 


Af— a / » r 

2("7 

1-0 \ 1 


r) 


r( d + a 4 - 1 ) r(A^ -f h - d) 


Z 7 

taQ \ z y 


— a. 


r(d + a) r(Af + b - d) 


Now apply (5.3) to numerator and denominator separately; then 

fih) = k F + ^ - ») 


a+b + n a-hb + n" 

0 + b + N aiN — n) 
a + b + n “ T+ b + n ” ^ 


Putting ^ , = a 


a = —i— b = ^LmuL'zP 

a - 1 ’ a ~ 1 

Substituting the values of « and /? one finds that p > 0, N' > an + 0 aiul that 

""ivi ^ 1 special case iV = n the result is immediate, 

White It iV - ?i + 1, the result is obtained by giving to D a binomial distribution 

With, p = 



MINIM AX POINT ESTIMATION 


193 


6. Non pa.rain.6tric prolilcnis* Wc shnIl in this section consider estimnbion 
prohloiHH in which the fiinctnmal fomi of the distribution of X is not assumed 
known. li(‘.stnc(i(in.s will he imposed on the variables only to insure the existence 
of estimates with bounded ii.sk. The problem will be treated under two different 
Hueh re.strietion.s: (i) llial the variable.4 are bounded with known bounds, (ii) that 
the, variiUile.s have bounded varianco.s. 

In the first of Iheai' casi'.s we can lussume without loss of generality that the 
variable.^ are distributed over the interval [0,1], and then obtain 
'rnKOHEM (l.I. Lcl A'l , • • ■ , X„ be iyidependenily distributed over [0, 1] according 
to a joint dislrUmlion belonging to ajamily .T. Suppose that contains the subfamily 
Tu according to which Xi , • • , X„ are independently and identically distributed 
with Fix, = 1) = P, P(A', == 0) = 1 - p, 0 < 7 J < 1. Lei EiX,) = u, , 
1 " 

- 23 Ml “ M- Then the miniinax estimate of fi is 
n ,»i 

(til) + 

PiiotiF. SiiK’O (0.1) i.s the minimax estimate of m = P when the distribution of 
the X’s is known to belong to Iti , we only need to show that its risk is largest for 
the distributions of Itn . But 

L’(.lX -I- B ~ nr -■ AV* d- [B + (A - l)pf = -f- [5H- (A ~ Dm]' 

/t> tOMl 

and 


ilLfXD - 


£ -Ml 


2:(m. - m)= 


nfi^ ^ nM(l — m) 


where, equality liohLs for the distributions in % • 

(JoiioimARY D.'d. Lei Xi Xn be a sample from an unknown univariate 

distribution over [0, 1). Then the minvnax estimate of E{X,) = y is given by (6 1), 
(k)iiOLi.AKY (i.d. Let A'l , • • • , X„ be a sample from an unknown absolutely 
conlinuous univarialc dislribulion over [0, Ij. Then the miniinax estimate of E{X,) = 
y IS given by (O 1). 

(Jorollary 0.8 follows from the fact that any risk function that can be obtained 
for binomial distribution can be approximated by means of absolutely continuous 
dustributioiis. 

Theorem 0.1 can be extended to include vai'iablo.s that are negatively cor¬ 
related. Namely if A'l , • • , A'„ are distributed over [0, i] according to a joint 

di.Htribulion belonging to some family T, if for each distribution of ‘S the conela- 
tioii eoi'liieient p,j of A', , X, is g 0 for all i,j, and if (f contains the family To of 
theorem 0.1, tlieu tin' conclusion of this theorem remains valid. This result can 
be u,scd for exanipU' in the following situation. Suppose a sample of n is taken 
from a lot of unknown size, and suppose it is desired to estimate the proportion 


p of defective.s in the lot. If h is the number of defectives in the sample, it follows 

.. 1 / fc . 1\ 

from the above remarks that the minimax estimate of P is ^ 2/ ' 



19-1 


j, I,. .Ti: 5. I. 


It should be out lli;U llii-n-ull rud’. u sj-o*. i !-to’id » » Mm-d 

known for the lot size. If iMs known ?hi‘liiuni'" j •'3 !'• ly -n ijife - X . 

then the mimmax (’stiinnle is tliiil hiiiiul in d i.o 

metric distribution w ith A' = A'n . 

Next let us consider e-'liniafiint ih<‘ ililliinxo <•{ tl'o sn* io m 

groups of variables. 

Theorem 0.4. Let X| , • ■ • , A\ ; )"i , ■ • • , IT t<. ,u>h /<. ‘,<u /.'by ■/ << • 

the interval [0, 1] acrartUng In n joint ilisiritoition h, fu o ’’.j.,, X M > '/fjm < 

that IF contains the snhjamibf Itj . nrrorihnn U* n-ho-h A; . ■ .X' i , . 

Yn arc two samples with 1‘{X> 1) - //, ./’bX*, ii 1 ji, ;/' 1 ! 

pj , p(7, = 0) = 1 - pj. 0 < ih, ih !•" 1. i/i: .V. i u., IX r. . . ' .i 

a 

Ml -cj = P, then the minimnr (sltmalr of u • - r 11 
n 

(6.2) (.V - Yi. 

1 -b V'dn. 

Proof. Again, since (0.2) is the miiiiiuux n-luiiaie in liu- iuHnnu.d > .i ■■ I’imIi. 
lem 2 of section n), we need only verify llial it • n-k i- n iua\miii!ii iii b, Ibb 

E[A{X - Y) - (fi ~ iMj-' 

= - jl) - .uT - CJ : ( 1 1 ■,( 

- 'd'(cri. 4- (T*,) -b (.4 — ly i'n - rl . 

of -which -we already have shown that il is inaximizi-d in ih<‘ i.nionn.d ca .r- 
Up to now we assumed the variables to lie butindi'd l.ei n- now e in 

stead that the variances are. bounded. With Hin ;<s-iirii|t!ion e.c i-.ui gm- an 

analogue of the classical Markoll theorem on Ica.'-i -i|naii“. 

Theorem 6 5. Suppose thatXi , ■ * • , X„ an noli jn mh ulhi ili -n thui’il ’O'l onliioi 
to a joint distribution helon/jituj to some Jamih/ .'t, irLirh rouUun" lh< \iil>fiii/tili/ 
fo where the X’s are normal with vannnee M' Snjiiioi, l),iil fur u(l i/e-.b dm/oor. In 
T, E(X,) = and ux, ^ M'. lib’ itssuiw llo noilnr in i In hr Limmi 

and of rank s ^ n. Then the estimate [/ifX;, ■ • , fiXi] uf (/t, . • . it,! mhirh 

minimizes sup E ^ [/.(X) - O,]', is the MarbifJ estnniih 

i *■ 

Proof Consider first tlie suhfainily Ih, . Thini Ihcri" (‘xisi an i u't liogi itial I rail'* 
formation to 7i, ■ ■ ■ , 7,, .such that E(Y,) - k,(l, for i where 

fc, > 0; E(7,) = 0 for i = s 4- 1, • - • , ;i, jiud a] , ,1/' for / 

Then (7i, • • ■ , 7^) is a.suHicuenl stalislic tor (th , • ■ , M », and il i*- ea*-ily .'>liown, 

using the methods of [G], Ihat ^ iui>‘iiua\ I'^tiinnti* fur 

f 1 , I 6»). But this is the hlarkoCf estimate. In older to eunn>l!*b‘ (he pioofwi* 
must show that the risk of this estimate takes on m fF„ its .snpieimuu over IF, But 

thisisimmediate,forPZ:-i[/.(X) - 0 ,V = eZIi (- - O.V’ £ Et.! * 

\k, / k\ 



IMUNT Vl'ION 


105 


In a hiiiiiinr inaniitT if n. nn-ily >-lni«n tliaf tlii' ]f'as( .-'.(luams eslirnate for a 
linear iiuicsinn of tin>' nr niMii' nf the i- tin* nhnnna\ rstnnatc. 

'riii'iin'in a jir iiiifation of flie Icaef Minarcs cslimatc ililTen'iif, from 

that <il tlio M jiImH' ihi'tiiern In the Murknfi thcdictn, it is slwiwn that tlio least 
(‘'.tiniatn iiii y‘,}umi:hi ,nnalle''l ri-k aniniifr all linear unbiased estimates, 
hole it i- leean that tie* Ie;e4 ,'•nlI!l^e^ e.sliinute miniiniiies the maximum risk 
aninim al! e -iirnati . I’l'lie ;i ‘•utiijilinim e^neerrlirl^^ viiriatiei's also dilTer.) 

7. Prediction problems. Fienufiitly one is interesleil in estiniating the value 
of a raiulniii vaimble lalhei than that of a parainetei. A eastumary method for 
IliiN i^ )n o,.| iniate the esjieelation of the ramlom variable fa parainoter) and then 
(n "iilentify" the variable and its exiieetatmn; i.e., to n.se the estiinalo of the 
expertatmn n- a preilietion for the vnnahle As we shall see below one is led to 
tins ]iiorediire if one udo[)t'. the iionil of view of unliiased estimation, so that 
fioin thi- jiiiint of i iew pieiiielioii po.-es no new piolilem. 'I'liis however is no 
Iniigrr line wlieii one einjihiy.^ the miiuriiax piiiieiple. 

CoiiMdor a pail A'. V of laiidom \arialtles liaving a joint distribution P 
beloiigiiig <o a faiiuly't ot di-tiibiitioiis It is desired to use the observed X to 
jiredier, -av, ll'e are iiifereMed in minnnax iiredietions, i.e,, funrlions 

/I'.Vl whieli tiimimire .uip/.,;! /'.Vll’h/d')./(A'j) 'I'n obtain luinimax predictions 
we need the following aiialoyin- of Theorem 2.1. 

'I’m OKI M 7.1, [A 1/’^!, ti c to hr a imramilric mhfamiln of ft, and let he a 

prohithiliijj 1/1111^)1)1 on)' Siij/posi'llialf isf<iirh that J /i«Tl [jfP), /(A)] dX(0) ts 
ni)ni/»)/»/, mill (hat 

(ij A’idf'li/i I'l. /(.Vij i.v ronxlant, nmj ~ c,Jor all 0 e w, 

(lit AVU'ioiI'j,,/tA'! rjiirnll 1’ 

Tlun J i.s a uitniiiiaj pn ilirdiin for g(.Y). 

'The 111 oof i.v the e.xaet analogue of that of Iheoicm 2.1, 

Counm.vKY 7.2 .1 ntimlnnl riel: Bai/cs pmliclioii is a minimax prediction. 

Su[)po.'-e now that X and F arc independent and that W[g{y), f(se)l = 
[.'71','/) ~ /(•' I)’- Consider the pruhleni first from the point of view of unbiasedness. 
A predielion could reasoiuihly be railed unbiased if EpflX) = Epg{Y). Subject to 
unl)ia.sedii(“-s, the lisk is giia'ii by i!rV[f/(F) — /(A')]" = o'/./fA) + <rp g{Y). 
Hut tri’.f/frj IS a known fiinetion of P, and hence the problem of miniraizins 
(for a iiartieulur /’) tlu' exiieeted scjiiared error reduces to that of finding an 
uuliiiised (‘.stiniate of AV/(^') "'ith mininiiini variance at P, In a similar way one 
sees, wilhout any reslrielion (o iiiiliiused predictions, that the Hayes prediction 
for //(F) is till' satud us the Bayiss’cstimalo for Epg{Y), and hence that formula 
(d 2), willi rj(P) rcphice/l by EpijiY), may be used if the assumptions there made 
are valid. 

One might expect that as in the unbiased theory the prediction will coincide 
with the estimate. This howci'cr is not the case since the X’s that give constant 
risk in the two cases will usiially be distinct In fact the two problems are rather 



1<)G 


j. I. maifij 


JK \Mt 




lUlMllN 

n. 

*i<ir( .'t. 


diUt'rcnti m lluit the "Iciiht \ for ihi- iiS'dii'liMK iii>s‘,' iiuf 

only tnkc* into account llic diffimiUy of funlsiiH; <li*‘ c^rm f v:tiiu‘ of H f-.r van. 
a imori distributions but also tin* (lilficulty of {ircdiriiiiK y F i i^boii H > . kno-. 

As a first example consitler llie pmhetion unaiogni’ of problem ! mj 
L et Y be mdepemleiit bmomud ■sanublr iqrh that /' V 

iHl - B.m\ P {Y - h - Wo .dstll obtain lie ' JoJlIOOUX, 

prediction of Y in amanuer iiuile atinlonous l o tlie ouc m ubieh \'o‘ ii< ttHHUud 
the minimax estimate of p. Artiially, tlie im-wnl ptoblctu i^. a g,<‘ii('ri»l!/;»tiMii of 

the earlier one, to which it can be riHlucial by let (inj; w - ■. /. Fmt U i - ea ily • t eii 

that 

:;y 


\ m 

is a quadratic function of p, wliioh when hi > 1 i**’ constant fur 


m 


m 
1 • 


V: 


I 

Tiiri 


X 

But we have already seen lliatab -p jj (Jm Baye-i sobilion l■o^u^-.llnllllIun to 


Cp'‘~’ where a = 


m 


,fi - 


wliCIl 


- , , 1 > o -- , . , • i ’Iciiiiy ,t 

m + a + h til + fi +• /» - 

a = 1), and a > 0 provided 0 < a < 1, which is cumIv vcriliinl whtai m, n > 1. 

We note that as n —» <», the values of«, ft tend to tlioKi* of the niittiiiiav I'st nnale 

of P. 

X ... YV. n ■ I 


( X y\’ 

(a —1-/3 — — I is constant fur « • 

\ OT rt / 




I 


1, and 


the 


When m = 1, E 
X 

and again a — + ^ is the Bayes estimate of a lieta dist rilnitiun when n 

/lb 

hence minimax. 

Finally in the case ti = 1, the situation degenerates. Siiiee AYi - I'r' 

prediction j{X) = ^ has constant risk. In addition u is the Hayes piedietioii 
corresponding to the distribution which assigns prolmhilily 1 to p J. Hence 
in this case, reprdless of the value of X one, would predi'et for )' I lie value i. 

It is interesting that the above prediction problem can be intiupreled al.so as 
an estimation problem in the following manner. Huppuse a lol of sisie .V m 1 n 
15 such that the number of defectives follow a binomial di.slrihutiun; this is the 
c^e when the items making up the lot are produced liy a iminufucturing i)rnces.s 
that is in sta,tistical control. It is desired to estimate from a .simijile of size tn 
taken from this lot, the proportion of defectives in the remainder. 'I'hal ihi.s is 

E Problem treated above follows from a remark of 

Mood U3] that m such a lot the number of defectives m the sample and in the 
^“^d^P^dently distributed according binomial distributions with 

i.iirriirjfiTi 



MINIMAX POINT ESTIMATION 


197 


We can aKiun use the hinomial results to obtain the solutions of certain non- 
pammetric, problems. For e.vainple, let .Yi , • - • , X„. be independently and 
Kienti<‘ally distrihuteil on [(I, 1] and let Fi, ■ • ■ , 7„ be another sample from the 
.same distnhuticm Thenjlie mipimax prediction for Y i.s given by aY + /3 with 

m r, /l 1 i I 1 — a I 

" "" 771 1 T 71! 71 " «m J ■ ^ " ~ 2 ~ ■ 

that 

E{aX -f d - yf -■ A’(«fY ~ jl) - (Y - n) + 0 + {a ~ 1)m)]' 

" '*"* (1 + ,1) «’° + [^ + (“ - 1)m]“ 

" “ (ii! 7l) ''(I “ ~ 

An analogous modillcation chsirly is possible for theorem 0 4. 

For the .situation cnn.sif]er(>d in (i.o, the prediction problem gives the same 
result !W the estimation jirohlian. For consider lir.st two samples Xi, ■ • ■ , 
X«t ; Fj , • ■ ■ , y'„ from ii normal dtslrilmtion with known variance < 7 ■^ Here 

AV[/(A-,, ••• , A'„) ~ Yf - F.l/fA',, ••• . X„) - Bf + ^, 

n 

and hence, the, risk dilTers from that of the e.stimation problem only by a con¬ 
stant. I'liUH Y is the minimax prediction of Y, and it is then seen immediately 
that it is also the minimax prediction for F when of the underlying common 
distribution of the Y’h and F’s it is assumed only that the variance is bounded. 

IlEFERENCES 

11] G. W. Hhown, “On small uamplc estininlion," Annals of Math. Slat., Vol. 18 (194.9), 
p. 514. 

|2] A WaU), ‘'tkmlrihutioiiH to Ibe theory of Btatisticiil estimation and testing hypothe¬ 
ses,” Annals of Math. Slal., Vol, 10 (1939), p. 299. 

(3) A. Wald, On the Principles of Statistical Inference, Notre Dame Math. Lectures, 

No. 1 (I942). 

[4] A. Wald, ".Statistical decision functions which minimize the maximum risk,” Annals 

of Math , Vol. 46 (1915), p. 265 

15] A. Wald, "Statistical decision functions,” Annals of Math. Slat, Vol. 20 (1949), p 16S. 
10] C Stein and A Wald, “Sequential confidence intervals for the mean of a normal dis- 
triliutuni with known variatiee,” ATtnais of lllath Stat., Vol. 18 (1947), p 427 
[7] P. It. IIaLmos, “T!ic theory of vinliiivsed estimation,” Annals of Math Slat., Vol. 17 
(1940), p, 31. 

jfi] K. L. LnitMAN.N A.vii V, Stki.'j, "Most powerful tests of composite hypotheses I. 

Normal distnliutions,” Annals of Math. Slat., Vol. 19 (1948), p 495. 

19] A, Wald, “An ('.sscnliallv complete class of admissible decision functions,” Annals 
of Malli. Still., Vol, 18 (1947), p. 649. 

110] A. KoLMOCKinotT, Grunilhcgriflc tier Wahrscheinlichkeilsrechnung, Berlin, 1933 

111] I) Blackwell, "Coudiliomil expectation and unbiased sequential estimation," 

Annals of Math. Slat,, Vol IS (1917), p. 105 

[12] E W Barankin, “Extension of a theorem of Blackwell,” Annals of Math. Slat , 
Vol 21 (19.50), p 280. 

113} A M. Mood, “On the dependence of sampling inspection plans upon population, dis¬ 
tributions,” Annals of Math Slat., Vol 14 (1943), p. 145. 



THE THEORY OF PROBAHILITY DISTRIBUTIONS OF POINTS 

ON A LATTICE’ 

Hy 1’. V. KitiMiw hf.ii 
Vu'Ji rt ii]i ti/ ih-Uiril 

J.. Introduction and summary, ‘nu-. p:i;hm ih'' u-.i - ihf ihtMiy i«l (■crtuiu 
probahiliiy distributitiiis ariMii^!; puin)* atiana'd m Pa' r.i! laifiir.,- 

in two, three and liiKlier ihmrii-iini... ‘I'lif imini' .no m1 t i h.u.i* ti i > wlneh tnr 
convenience are dewrilicd as I'tilor,". t two ihiia'ii’iHiml ttiijri' niH c.iii.j-.t nf 
771 X n points in w columns and n low-. In a thic' diinfri'a-nnl larticc flii'ic 
will be I X ra X n points in tlio form of a n-oianuntu icu.dlrinpiii.'ti 'I'wo 
situations ause for (‘imsidoiatioii. Tlu'v an*. ?>> it-'O iln- ii'im <>1 Mah.tlaiinhir.. 
free tmd 7 io/i-fi‘ec sarnplmfr. In fr»'f‘.«;un|)Iin>; llm oidtir p>f caidi lamii n ilolm iiiuicil, 
on null hypothesis, independently of the I'olnr nf tli>‘ lOiior jiittiit-. Ttir piidia- 
bilities of the points heloiiRinK to the dillViciU coIm!'. ~ny Idack. wliite, ete. 

k 

arc pi, Vi- pk, such that 53/), - I- In niui-fice ‘••nnplim; (he iiuitdior of 

i 

points of each color is spccilied inudvance, .sty«i ,«•■■ ■ n, cithal 

or Imn according as the lattice is fnti- nr lhi(‘e-diriii'ii-i<inal, t )nlv (lio anange- 
ments of these points in llio lattice are Mined, 

The distributions considered in this paper are Ihe fnlliuMiig: 

(i) the number of joins lietwecn adjacent pninis of (lie ~amo color, say 
black-black joins, 

(u) the number of joins hoUveeu adjacent poinis of Iavo speeilied colors, say 
hlack-whitc joins, and 

(ill) the total number of joins between'piiinls of dilTerenl eoloi.s, ,along mu¬ 
tually perpendicular axes. 

The methods used here are Iho .same as tliosi* developefl l»y the millior [d] 
for the linear case All the distribations tend to (In* normal form iilieii /, mid n 
tend to infinity, provided the p’s are not very small. 

Before considering the various distnlaitions, we .'^hall have n lirief review of 
the work done on this topic hy otlier peopli* For free sam[)lmg, .Mnrmi f'lj and 
[6] has discussed the distiilmtioii of hlaek-uhile and hlaek-ldaek join., for an 
m X ?i lattice of points of two colons. For a threi'-diineii'ional lulln'e, h<' has 
given the first and the si'cond inoiiK'nl.'. for (lie distrihiilion of hlaek-wliife 
joins Levene [-1] has annouiieod some results closely allied to (hoM' of .Momti 
for a square of side N (with N“ cells) each cell taking (he cluiraelerisiie .1 or H 
with probabilities p and = I - p respeelively, Hose |21 lias found (lie expec¬ 
tation of 

X = the number o f black patche.s — the number of embedded white patches, 

'^Part of a thesis approved for the degree of Doetiir of Pliilosopliy, Oxford tiuivcr.sily. 

19S 



I'itOlumUTV DI.STRIHI/TIONS 


190 


fur :i .-(inun* tin uli ti inft< n ' ■snuill ecu'-, li:ivin«;) und ry = I - p as lha probability 
(if (Ilf fflb bfiiiK black iir white. An emheilded white patch is one that lies 
eiiiniilelely iiiMdi- :i black patch. 

I'he above leview .''howe that the Work done .so far i.s eonhiK'd entiiely to the, 
fre(> saiii]ihiiK diet ributioii", the ))o)iits lakiii^ only Iw'o eharaeter.s. As mentioned 
in the lieKiMioiifj: of this aitiele, we shall deal liere with the free and iion-free 
,sampling: disfnbiifion,- for points po.ss<-.s.sinjr k eliaraelers or colors. 


2. Two dimen.sional lattice. 1. 1*1 an m X n reetanffiilar lattice consist of wn 
])oints of k eolor.s willi piohahilities /a , yi- , • • ■ p,_ , such that ICyir = 1 (When 
thr're aie only two eolor.s, yi, and ])■, arc tak(“ii a.s p and rj re.spectively.) All the 
prolilerns dealt with for flu* linear lattiee, (Krmhna Iyer, [d]) cam he investigated 
here al.so Hnf the most important of (hem is the dLstrilmtion for the total number 
of joiii.s hetween iioinls of dilferenl eoloi.s. 'rins fakes into consideration the 
relativ(‘ [losition of points of all e<iIor.s in the lattieto Distrilmtioms for the number 
of lilaek-lilaek or hlaek-wliili* joiti.s an' not based on the arrangement of all the 
points in the lattice and (herefoie cannot he, considered to he adequate for testing 
the random distnliution of the points in the lattice. Therefore the distribution 
of the total numb(>r of joins between jioints of dilTerent colors has been dealt 
with in somi' detail. .\s the aetiml distributions are very complicated they 
(Lie disen.s.se(I by meaii.s of eninnlants. 'I'he first and the second moments for 
the other di.strilnition.s lia\(“ also heeri given. 

2,1. First und nrriiuil mammin for Ihr dislrihution of Unrk-hlack joins for two 
or more rolors. 'I'he lir.st and the second moment,s for free sampling have been 
obtained by Moiaii |.’)J and (tlj. In order to give an idea of the methods used 
in this iiajier for olitainirig the moment.s and also to facilitate the derivation of 
tlie eorre.spondiiig moinerits for non-free sampling, they have been obtained 
again for botli blackdilack and black-white join.s 

(a) Frrc Faniiiliiuj. In the emir.se of similar investigations on the distiibution 
of black-black joins arising from points on a line, the author [3] has found that 
the rtli factorial moment is r' time.s the .sum of expectations of the different 
ways of obtaining r join.s. This finding is true for the rectangular lattice also 


'This may be established as follows. 

Deliiie variates n,;. (t - 1, 2, • • • , n;= 1, 2, ~ 1) to be one if 

the. (ij) and {i,J+ 1) positions are. black and zero otherwise; tiionE (iqy.) = p^, 
and the higher factorial moments are zero. tSimilarly, define (i' = 1, 2, • • , 
n ~ \ 1, 2, ■ ■ ■ , m) to be one when the. (i, j) and (i d- 1, j) positions are 

black and zero otherwi.se; llieii F(rrj) ~ V', ^bc higher factorial moments 
are zero, h'lirlber, u,y i.s indeiiendenlly distributed of all u’s and a's except 


Jq.j'-i. 


U > ^'1—1,;' J ^'1 11 


i.j'+i . , iind Wvj IS independently 


distributed of all n’.s and e’s excepting two vertically adjacent u’s and four hor¬ 


izontally adjacent «’s. If 


s = 2 It,]! -j- 2u,., , 



200 


1>. V. hlU^HN \ ni.ll 


then 

’ '7 

= {'2mn ‘-- m - hi /P 

and E(s'’’) = 2E (thenumber of wayuof MdcftiiiKnny hvimf tin- onch iru'ltulfil in 
S?^,y/ 4- 2li,/y) 

= 2 E (uu -{- uv + I'l’ i 

involves only the crosH productH smce EY*/'"’t (I Eu t Ki»r itriMlm-lM of 
dependent pairs the expectation is 7 )’, while for inilepenilenl ]iaii> il // Henee 
one merely needs to count the number of tleiiemlent urnl independent produetn 
Similarly for the third factorial moment one iieed.s eon^ider only prtHlueIn of 
tnree first powers of tlievariatch (with expei-l alien p‘l. (ho.'O with two dependent 
and one independent variates (with expeelalion ;dh and (ho-e iMtli three de¬ 
pendent variates (with expeetatiou 7 /). 

Thus the second factorial moment can he obtained by eoimtiuii the nuinher 
of ways of obtaining two black-lilack joins from (i) three atijaeenl pnmts and 
(ii) two pairs of adjacent points. They are explaim'd below diagrarimiatically 
for a 5 X 4 lattice. 




■ • ■ X . ..... 

‘X’ denotes a black point. 

denotes .any point other than black. The ex[)e('tatious for items (1), (2) 
and (3) indicated above arc 

[(m - 2)n -f (u - 2)m]p\ 

(2.1.1) 4(m -^1) (,r - 

I [4mV - imnim + n) +Tn^ + ri - I2mn + 13 ( 7 n -|- ?i,) - 8] p\ 






MinitAHlUTr I)ISTKinT;TIO.V.S 


201 


r('i5pprtiv«‘ly 'rinih 

(2 1 2) ■' -h ») + 4Ip’ 

+ iwaifm i 7i) 4- >nV - I2mn + I3(m + „) _ 8]p\ 

It i’an lutw hi* wi'ii that 
(2 1.3) /j'l - (‘im/i m “ 

(2.1,4) ~ “■ ■“ 

- (I4?7m - 13m - 13n + 8)p\ 

Putting r/i ~j~ fi ~~ ft, ami >7i?i = ft, the above exprehbioii.s reduce to 

(2.1,0) m! - (2h - a)//, 


(2 1 ti) Pa =■- (2fi ~ a)p" + 2(0/; - Ga + 4)j/ - (146 - 13a + 8)p\ 

'riii'.S' .Huh.'^tilutiniiH have hccti continued tiiroughuuL this Section. 

(b) .Von-/nr mimpbnij Tim chances ol obtaining r black points in free and 
nou-free .'^ainjiling are p and re.spcclivcIy. Therefoie it ia obvious that 

the rth factorial moment aliout zero for non-frce sampling distribution of black- 
bliK'k join.s (*an iie rcductal by .substituting forp"^ in.p[r) for free .sampling. 

This substitution gives 


(2.1.7) 


/ _ (26 - a) TiP' 

PUm.n,) —- Tjr. -- , 


MUIni ,njl 


(26 - a)nf> 

yw 


+ 


6 «) 

2(G6 


6o f- 4)ni 


(3) 


6(« 


( 2 . 1 . 8 ) 


{(146 - 13a + 8) - (26 - o)'lft{'' 
/;(n 

r(26 - a)nP^ 

6 ( 2 ) 


where MrCnj.nj) represents the rth moment with ni black and ^2 white points 
on the lattice. 

2.2. Cumulanls for the dislribution of black-white joins for two colors. For m 
points on a line, tlie author [3] has shown that the first four cumulants of the 
free and non-free sampling distribution of black-white joins can be obtained 
from the non-frce diatriliutions for (I, m — 1), (2, m — 2), (3, m — 3) 
and (4, m ~~ 4) black and white points distributed at random. This method is ap¬ 
plicable for two and three dimensional lattices also. T'his can be established from 
the following considerations. 

(i) The rth moment about zero for the free sampling distribution i.s 


* This result differs slightly from that given by Moran. The correct result is the one 
given bore. 



20'2 


P. V. lYKH 


where* (i = mn and ^/fz 5^ Ihe rtli innnicni fm lln* nmi fn'i* diMrilmtuin with 
s lilack and (/) — s') white jannfs 

(li) 2 )i'7i i.s tlie name hir the tnn diftrihutum- an>~in^ frciiii H) s hlnek :md 
{h — s) white points and (2) (h -* lihii'k and ,*• wldne jnuni". 

(lii) The rth moment, is a polynomial in /«/ of deun'r r. This ean In* .se**ii from 
the fact that the faetorud moment is the sum of the exjieetations of the ihffeieiit 
ways of obtaining r hlaek-white joins. Tin* piolMhilitv of r independent hliiek- 
white joins is (.‘ZpqY and this is the highest, inmer of /w/ 

In view of the above eonditions, fi) rednees to 


( 2 . 2 . 1 ) 


A'uvqiv + + -■firpV(?> + f/1^ ‘ • + d- qY 


~ A'ujhi -f A'i,p'if • ■ -)- A'rrp'q', 


where .dir, Air etc. are determined from the following relatioiw. -- 

firda-l) = jdlr, 

- dir + j ‘"^-‘Ur, 

^ g - i' . ~ A i' , 0> - A / 

OrW,b-,l) - Aar i- I ^ 1 Air "t" \ 0 ) Air , 

I Srn,i_4) = A[, A- J Air + Air + 3 A A 


where Sr(.t.t-i) is the rth moment about zero for the non-frre distribution with 
t black and {h — i) white points. This i.s obvious hv coinjiaring the eoeffieienfs 
of p‘g‘~‘ in (i) with (2.2.1). 

Therefore the first four cumulants can be calculated by finding the freipiency 
distributions of black-white joins for ( 1 , h >- 1 ), ( 2 , b - 2 ), ( 3 , b - 8 ) anti 
(4, h — 4) black and white points. These distributions were de.tenniiietl by a 
systematic examination of the number of black-white, joins lu all the po.ssible 
arrangements for the given number of black and white points. 'The moments 
of these distributions enable us to determine the . 4 ’s. 

The equations in (2.2.2) give 

An = 2(2& - a), 

An = 2(8b - 7a d- 4), 

An = 2 (32b - 37a d- 30), 

Ah = 2(128b - 175a d- 220), 

An = 4(a' - 4ab + 4b“ -|- 13a - 146 - 8 ), 

A 23 = 4(21a® - 66 ab d- 486° + 210a - 15G6 ~ 228), 

A 33 = ^-a -b 6 a°b ^ I2ab° d- 86 ® - 39a" + 120a6 - 846= - 272a + 184b 
"r 312), 

Aii = 4(295a= - 760ab -f 448b° + 2305a - 13046 - 3428), 



ru(IHAHILITY DISTRinUTIONS 


203 


An = S(-42«'' -f 2U]ali - '.WMab' + 192/^'’ - 1410a^ + 3G12a6 - 2016&® 
- 78.S4« H- 31! tH/; 12720), 

A'u = - Sn% + 2 0r//“ - 32(£/»'’ + Ki/t' -f 78a - 39Ga’'/> + Gt 8 aG“ - 336i>'* 

+ 1043a -- 4l!)(ia/) + 2252/r + 702G£i - 3084fc - 13464), 


wluTo ft j ni 4 - n, and h -■ mn. 

4’ht' ab()V(! valuer of .4 ',s giro the first four moments for free .sampling about 
zero. The enmulaids n-duee to the follou-ing expressions: 


(2.2.3) K, - 2(2// - a)pq, 

(2.2.1) Kn — 2(8// — 7tt + A)pq — 4(14// — 13a + 8)p\^, 

( 2 . 2 .. 0 ) X 3 = 2(32// — 37ft + 3 G)p 5 — 8(90// — Ilia + 114)pV 
+ <il(2!)// - 37ft + 39)py, 

(2 2.0) = 2(128// - I7.')ft + 220)//? - 4(1784// - 2Gl7ft + 347G)pV 

+ 32(1.348// - 23Glfl + 3228)pV 
- 32(3120// - ISnOft + G828)pV- 


As indieateil for lilaek-liliielc joins, the first and the second moments for 
non-free .samjiliiig can he caleiilated by .substituting 


7/q‘ = nr'j^'V//''-^” 


in the uncorrcicted moments about the origin for free sampling. This is true for 
all the distriliutioius e.unsidered in this paper 
Before proceeding to discuss the limiting form of the distribution, it may be 
noted that the lirst four cumulant.s for the free-sampling distribution of black- 
white joins are linear c.xpressions in a and b. This result is similar to what has 
been established for the linear lattice (Krishna Iyer, [3]). Wlien the points 
lie on a line, all the cumulanls of the disti-ibution of the number of joins (black- 
black or black-white) are linear in m (the number of points on the line). This 
suggests that the higher order cuniulants for the distribution of joins in a rec¬ 
tangular lattice also will be linear in a and 6 , i e. the rth cumulant will be of the 
form 


E (LJi 4- + Nr.)v'q‘, 

1-1 

where L, M and N ma independent of a and b. It has not been possible to obtain 
a formal proof for this statement. 

The limiting form of the distribution of the number of black-white joins is 
now examined on the basis of the cumulants given above Since /ca, xs and /c^ are 
linear in a and h, yi and 72 tend to the limit zero as m and n tend to infinity 
That the higher order 7 ’s also tend to the limit zero can be seen from the fact 



204 


P, V. KltlSIINA lYI.H 


that all the cumulants will be linear functinn.s in a and h. Hence the distribution 
of 

_ X. — a)pq 

^ iiUir- ina+~ 

tends to the norma! form as ni and n tend to intiiiitv, where x is the (iliwi ved 
number of black-white joins in a p;iven uriangernenf nf (he points 
When p = ? = i, the first, seeotui and third cumulantH are equal to tlioM' 
obtained for a binomial distribution whose is (21) " nj 
As in the case of linear lattices, the, distribution of the number of hlai'k- 
■white joins in an m X n rectangular lattice for noii-free. sampling also will tend 
to the nonnal form as vi and n tend to infinity. 


TAIHHC 1 


DislnbuLion of the number oj hhck-tehilr joina for 2 X S laUicr 


No QfB-W 
joins 

0 

1 

2 

No. of black points 

3 -1 

T) fi 

Tiitnl 

0 

1 

— 

— 

— 

— 

1 

i 2 

1 

— 

— 

— 

— 

— 



2 

— 

4 

2 

— 

2 

4 

i 12 

3 

— 

2 

4 

6 

4 

2 - i 

1 18 

4 

— 

— 

5 

8 

f) 


! 18 

5 

— 

— 

4 

4 

4 

— 

12 

6 

— 

— 

— 

— 

— 

— - 


7 

— 

— 

— 

2 

— 

__ 

2 








(14 


Kl ~ 7/2, K2 — 7/4, )C3 = 0, )£< = - , 

8 


In order to have an idea of the nature of the distribution of the number of 
black-white ]oins when p = q or otherwise, the complete distributions for the 
lattices 2 X 3, 2 X 4, 3 X 3 and 3 X 4 arc given m Tables 1, 2, 3, and 4 
The distributions tabulated in Tables 1, 2, 3 and 4 allow that tho probability 
of getting 1 and {2b - a ~ 1) black-white joins is zero, while for 0 and 
(26 ^ot so. But this abnormality will not affect the limiting 

form of the distribution when m and n tend to infinity because the prohaf.ib 
ity for 0 and (26 - a) black-white joins also tends to zero. 

2,3 First and second moments for the dislrihulion of Uack-whitc jouis for k 
colors. Free sampling. p, and p^ as the probalnliLics that a point in the 

lattice IS black or white, the expected number of black-white joins is 


(2.3 1) 


2(26 — a) pipi. 



PROBABIIiITY DISTHIBUTIONS 


205 


TABLE 2 

Disirihidion of the number of black-white joins for 2X4- lattice 


No. of 
B-VV 



No. of black points 





joiriH 

0 

1 2 

3 

4 

5 

G 

7 

8 

Total 

0 1 

1 

1 

— — 

— 

— 

— 

— 

— 

1 

2 

1 


— 

— 

— 

— 

— 

— 

— 

_ _ 

2 

— 

4 2 

— 

2 

— 

2 

4 

— 

14 

3 

— 

4 4 

4 

— 

4 

4 

4 

— 

24 

4 

— 

— 8 

12 

8 

12 

8 

— 

— 

48 

5 

— 

— 12 

10 

24 

16 

12 

— 

— 

80 

6 

— 

~ 2 

12 

20 

12 

2 

— 

— 

48 

7 

— 

— — 

8 

8 

8 

— 

— 

— 

24 

8 

— 

— — 

4 

0 

4 

— 

— 

— 

14 

9 

10 


— — 

— 

2 

— 


— 


2 



256 



- 

II 

fC2 = 5/2, 


o 

11 

K4 

_ 13 
4’ 




TABLE 3 

DistribiUion of the number of black-white joins for 3X3 lattice 


No. of 
B;W 
joins 

No. of black points 

Total 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0 

1 

1 

A 

— 

— 

— 

— 

— 

— 

A 

1 

2 

8 

32 

A 

3 1 

— 

4 

8 

4 

— 

— 

4 

8 

4 

— 

4 

-- 

1 

6 

4 

12 

12 

4 

6 

1 

— 

46 

5 

— 

— 

12 

24 

12 

12 

24 

12 

— 

— 

96 

0 

— 

— 

10 

26 

36 

36 

26 

10 

— 

— 

144 

7 

— 

— 

— 

12 

36 

36 

12 

— 

— 

— 

96 

8 

— 

— 

— 

10 

13 

13 

10 

— 

— 

— 

46 

9 

10 

11 

12 

1 _ 

__ 

— 

4 

12 

4 

1 

12 

4 

1 

4 

— 

— 

— 

32 

8 

2 



512 


Kl = 6 , 


K2 = 3, 


Kj = 0; 


k4 = 4,5. 















206 


P, V KRISIIN'A IYER 


TABLE 4 


Distnbution of the number of hlnck-whito joine for 4 X 3 lattice. 


No of 
B;W - 
joins 





No. of black iioints 






Total 

0 

1 

2 

3 

4 

5 

0 

7 

8 

!) 

10 

11 

12 

0 

1 

1 

— 

— 

— 

— 

— 

- - 

— 

-- 



— 

1 

2 

2 

— 

4 

— 

— 

— 

— 

— 

— 

-- 

- 


4 

1 

8 

3 

— 

6 

8 

2 

.— 

— 

2 

— 

— 

2 

8 

0 

i 

34 

4 

— 

2 

8 

8 

10 

4 

— 

4 

10 

8 

8 

2 

__ 1 

04 

5 

— 

— 

22 

28 

10 

18 

16 

18 

10 

28 

22 

-- 

— ; 

172 

6 

— 

— 

22 

46 

56 

42 

30 

42 

56 

46 

22 

— 

1 

' 302 

7 

— 

— 

6 

52 

88 

88 

120 

88 

88 

52 

0 

— 

— * 

588 

8 

— 

— 

— 

50 

119 

162 

150 

162 

115) 

50 


— 

— , 

818 

9 

— 

— 

— 

28 

104 

184 

186 

184 

104 

28 

— 

— 

“ 

' 818 

10 

— 

— 

— 

6 

58 

134 

192 

134 

58 

0 




588 

11 


— 

— 

— 

32 

88 

122 

88 

32 


... 



302 

12 

— 

— 

— 

— 

10 

46 

48 

40 

10 



— 

■ 

172 

13 

— 

— 

— 

— 

2 

14 

32 

14 

2 

— 


- 

- 

; 04 

14 

— 

— 

— 

— 

— 

8 

18 

8 

—- 


—- 

- 

- 


15 

— 

— 

— 

— 

— 

4 

— 

4 

-- 


— 

- 

- 


16 

— 

— 

— 

— 

— 

— 

— 

— 

— 

- 

-- 

- 

.... 

1 

17 

~ 

— 



— 

— 

2 



— 


...1“ 


1 ^ 


405)0 


Ki = 8.5, k2 = 4.25, = 0, Hi = 0.875. 

TABLE 5 

Frequency distribution of the total number of joins between points of different 
colors for 1 black, 1 while and (mn — 2) red points 


No of joins 

Frequency 

4 

28 

5 

4 (5a - 26) 

6 

2(2o“ - 25a + 4b -H 50) 

7 

2(-4a" + 2ab + 17a - 6b - 12) 

8 

4a2 - 4ab 4- b“ - 4a + 3b - 12) 


As in the case of hlack-black joins, the second factorial moment about zero 
IS twice the sum of the expectations of the different ways of forming two black- 
white joins and can be determined by the method described m section 2.1. 



PROBABILITY DISTRIBUTIONS 


207 


(2.3.2) 


nU] = ^(0/j - fid + i)pipi(pi 4- pj) 

+ 4(a" - iab + + 13a - 146 - 8)plpl . 


From this, m work.s out to bo 


(2 3 3 ) ^ ~ ~ + 4)pip2(pi + pj) 

- 4(146 - 13a + 8 )pV 2 • 

2.4. Firm and .wond jnimmiff for (he dislrihUtoti of (he total number of pins 
brlwrcti points of diffetcnl colors for three colors. The expectation for free sampling 
i.s 

(2.4.1) p[ = 2(26 — a)SprP> ■ 

The coe.[ric,i('nts of piq and in the second moment are the same as those for 
two colons. The t'ocfliciciit of pip^ps can be olitaincd from the frequency distribu¬ 
tion of the total number of joins between points of different colors when there 
are 1 black, 1 white and (mn — 2) red points in the lattice. See Table 6 . 
Defining iS' 2 (i,i,!.- 2 ) = "C'/i for the above distribution, 

‘S’ 5 (i,i,fr-- 2 ) = 2(4a' - 30a6 + 326" -f 55a - 546 - 32). 

As in tlic easo of two colors, the second moment about zero for three colors 
reduces to the form 

A2i(pi + Pi + Pa)* ‘ Sprp, + Aiufpi + Pi + Pa)*'® PiPiPs + 

A32(p + Pi + Pa)*'* ^Ptpl = AiiSprP, -f AuiPlPipi + A 22 2prp“. , 


since Pi + Pj + Pa = 1- 

The coefficient of pi'"paPa on the left hand side of the above equation is equal to 
A'sd.i.ii-s) , i-c- iS 2 c,i,i>- 2 ) = sum of coefficients of pi'^pjps m. A 2 i(pi + P 2 + 
Pa)*'® 2prp. and A ivApi + P 2 + Pa)*~®piP2P3 . Therefore the coefficient of pipips in 
Ma is ,b-i) — coefficient of pT"~ “paPa in 2(86 — 7a + 4) (pi + Pa + p3)*'"2prp. — 
coefficient of pipipj in 

4(26 - a)" (Sprp.)® = Sia. 1 . 1 ^ 2 ) - 2(86 - 7a -f 4) (26 - 3) - 8(26 - a)" 

= 4(17a - 196 - 10). 

It can now be seen that 

Pi = 2(86 - 7a + 4)Sp.p. - 4(146 - 13a + 8)2)prpl 

(2.4.2) 

- 4(196 - 17a + 10)PiP2P3 • 

2.5 First and second moments for the distribution of the total number of joins 
between points of different colors for Jc colors. As in the previous cases, the expecta¬ 
tion for free sampling is 

(2,5 1) 


2(26 — a)SprP» . 



208 


P. V. KRISHNA IYER 


The coefficients of , 2prp.pi and 2pjp', in the second moment are the 
same as those for three colors. The coefficient of iprp.pipu is dotormined by finding; 
the distribution of joins between points of dilTpreiit colors when there are 1 
blade, 1 white, 1 red and mn - 3 green points in the lattice. See Table (1, 

= 2(12a*& - 69a6^ + 72b^ - 3Ga“ + 330ah - 'M2lr ~ dffSa + 318/i 4- 2'10). 

The coefficient of 23p,p,piPu m ps can be obtained on the same lines aa explained 
for three colors and is equal to iSaa.i.i.mn-s) ~ coefficient of p, ^piptjfu in 
the homogeneous expression of degree win in gj for three colors + 8(2/i ~ a) 

= 8(14& - 13ffl + 8) 


TABLE 6 

Frequency disiribuLion of the ioLal number of joim between points of diffoent 
colors when there are 1 blacky 1 white, 1 red ami (mn — 3) green points 


No of 
joins 


6 

7 

8 

9 

10 
11 
12 


Frequency 


240 

12(19a - 112) 

12(6o' - 78a + 7G + 208) 

4(2a3 - 57o2 + 15a!j + 310a - G61) - 444) 

6(-4a3 + 2a^b + SOa^ - 21ab + 26* - 86a + 366 + 72) 

6(4a3 - 4a*6 + a6* - 6a* + 8a6 - 26* - lOo - 40) 

(-8a* + 12a*6 - 6a6* + 6* - 24a* + 18a6 - 36* + 44a - 346 + 
192) 


It follows now that 

^ P 2 = 2(86 — 7a + 4)Sprp, — 4(196 — 17o -+• 10)Tprp,pi 

J 

- 4(146 - 13o + 8)2pjp^ + 8(146 - 13a + 8)Sprp.P(Pu . 

In general the cumulants^ for free sampling involve 6 and a in the first tlegreo 
only, and therefore, when m and n are large, the distribution tends to the normal 
form. If a: is the observed total number of joins between points of dilToicnt 
colors, the distribution of 

X — 2(26 — a)Sp,p, 

Vb 


J The author has recently obtained the third and fourth cumulants for this distribution. 
They are linear functions of the dimensions of the lattice. The results will be published in 
an early issue of the Ind J Agric. Stat. 



prorahilitt distributions 


209 


tonds to the normal form with 

lfi::7)rPs - 7{)ZprP,Vi — 00275 ^?}*, + 112XptP,ptPu , 

as its variance for lar^e values of m and n. 

For non-free sainpliriff also, the distribution of 

*2(2tn n — m — 7i)2e,p, 

V *mn ’ 

where <, ~ n,/mn, aiiproaches the normal form having 

421',f.e, + 82erC^ — 162e,eieieu 

as its variance. The error of this vanancc will be about 5% or less when m 
and n are gn-ater than 85. 

3. Three- and higher-dimensional lattices. This section deals with the first 
and the securul moments for the di.stribiition of black-black, black-white and 
the total number of joins between points of different colors for three- and higher- 
dimensional lattices. Bcside.s- these, the third and the fourth cumulants for th? 
distribution of black-white joins 111 a three-dimensional lattice with points 
of two colons are also given. 

.8.1 Firdl and mwid mornmlH for the dtslnbuUon of blach-Uack joins. Free 
sarnplimj. Ia‘t Ei{l) be the expectation of the number of black-black joins 
for a lattice of .side.s I, m and n. Further let A-> and be the number of ways 
of obtaining a black-black join in m X n and I y. m Y. n lattices. Then 

hhd) = A,vl , 

.43 = Af + Mm(l - 1 ), 

and 

^l 2 = (2mn — m — n). 

Therefore 

(3.1.1) Ml) = {3lmn - Im - mn - nl) p\. 

For the sake of convenience all the results for the three-dimensional lattice 
are expressed after making the following substitutions: 

c = I A- m A- n, 

d = Im A- inn A- nl, 
e = Imn 

£ 3 ( 1 ) in terms of c, d and e is 

(3e — d)pl. 



210 


P V, KUItillN'-V lYKIl 


The expectation of the muniier of Idack-lilafk joins fur a lattice of r diinciisions 
(ii XkX ■ 1) is given by 

(3 1.2) Er{l) = {rhk • • - W. ■ k-i) Pi , 

where Sldj • • • hr-o is the sum of the jiioduct of the .ddcs (aUcii (r — I) at a 
time. 

It has been pointed out before that tlu* .second facttinal luotnciit is twice 
the sum of the expectations of the ditTorent way.s of fonniiig two 1 dark-black 
joins Using this fact, if 2Bi , 2B.i, «‘tc.. are tlu! eoenirdentw td ■ji' in (he .‘-ei'ond 
factorial moment for two-, thice- and highcr-dimeii.sional latlici's, it will he 
found by direct enumeration made m succession fiom lattices of lower dimen¬ 
sions that 


B, — JS(r-llh d- 4U(r-l) Ur — 1) -b hh • ■ l(r~l) U, — 2). 

This can be established from the fidlowing e()n.sid(‘rations. 1) Twii blacdc-ldack 
joins can be obtained from three black points situated close to one another 
and the chance of having three black points in a .specilicd manner i.s p\ 2) The 
number of ivays of getting two black-black joins from three points iii (he lattice 
is 

B(r-ulr -h -lA^r-vdr - 1) + hh ■ ■ Ur “ 2). 

Cr , the coefficient of p* in the corrected second moment, is given by the (’(piatioii 

Cr = -{2Br + Ar). 

This follows from the fact that the sum of the coellic.icnls of and y/ in the 
uncorrcctcd factorial moment, about zero, is twice the iiutnher of ways of .select¬ 
ing two joins from the total number of join.s in the lattice which i.s (Ar ~ 1) 
Thus 


(3 1 3) Arpl -f 2Brpl + Crpl 

is the corrected second moment for the distribution of blaek-blaek join.s in a 
lattice of r dimensions. For an f X m X n lattice 


(3.1.4) M 2 = (3e ~ d) pi + 2(15e - lOd -f- 4c) pi - (33e - 2]d + 8c) pi. 

3.2 Cumulants for the distnbulion of black-white joins for two colors. The 
first four cumulants for free and non-fice .sampling di.stril)ution.s in an I XmX n 
lattice can be determined from the frequency distributions of black-white joins 
for (1, Imn 1), (2 bnn - 2), (3, Imn - 3) and (4, Imn - 4) black and white 
points by the method described for linear lectangular lattice.s. If 

n'r = A'frpq + Aa,pV + • • + ArrP^q’’, 

the fiist three distributions give the coefficients of pq, p^ and pV’ m tlie first 
three moments about zero The three cumulants from "these ImenJs 

aio given below m terms of c, d, and e for free sampling. 



(3.2.1) 
(3.2 2) 
(3.2.3) 


PUOH.'i.niLITY DLSTRIBUTIONS 


211 


Ki = 2(,3r — (I)/) 5 , 

K. = 2 fl,Sr - 11(1 + ■lc)j)q - 4(33c - 21 rf + Sc)pV, 

«5 - 2(1 OSt' - 9lf^ + 60c - 24)533 

+ 8(327(' - 288rf + 198c - 84:)p‘q^ 
h 32121 lie - 107((! + 138(! - 00 ) 5 / 3 '- 

'I'lic ('jilr'dl,'ll 1(111 (i) llic fdnrtli cmmiliiiif, by the diiecb method of finding the 
frciltieiicv (lisliiliiilKmof thcnmnlierof liliick-white joinsfor-l black cand (lmn—4.) 
white piiiiits \\;i^ 1 ( 11111(1 1(1 lie v(>ry Inborioii.s and therefore this has been cal- 
eiilaled by a siieeinl iik'IIkkI 'riu- eoellieients of pcj, p^'i/ and 53V have been deter- 
miiu'd, as in other ea.ses, bv linding I't-Vifoi the hrst three distributions These 
(■(KdfieK'iit.s rediiei* In a linear inim in c, d and e Now the fourth cumnlant, being 
a linear fiinetion of the.''(> (nunitities, the coefficient of involves c, d and e 
in the lii.^-t degree only and tlierefore this can ho a.ssnmed to be of the form 

(ve + P<1 T yt: -|- 8, 

where «, /t, 7 and 0 are eoiistanls No .simple proof can be given here legarding 
the linear assiiiniitioii of the euimilants. It may be observed that this is true of 
tin* lir.sl, four eiiiuiilauts for linear and lectangulai laitiec.s. The author [3] has 
already provided a geiua'id proof of tins assumption for the linear lattice and he 
hope.s to extend this for (he higher dimensional lattice.s in the near future^ 

The eoiisliuits n, li, 7 , and d can he doterinined by finding for p = 5 = § 
from the fieipKaiev distributions of bhiek-whitc joins for 2 X 2 X 2, and 
2 X 3 X 3 lattice's for two colors as given in Tables 7 and 8 . 

When 73 -- 3 -= 1, K.t reduces to iho form a'c -h b'd e'e + d', where a', h', 
(■' and d' are eoTistanls, In ^■i('w of thus relation, if m and n are lixed, and Z takes 
A allies 1,2, 3, (>1 e., (he values ot K 4 for the different lattices should be in arithmetic 
jirogn'ssioii. This can be seen by e(im} 3 ariiig the values of K 4 for the lattices 
1X2X2, 2X2X2 and 3X2X2 which are 1, 7 5 and 14, respectively. 
Using tins projicrty, it is po.ssilile to lind kj for a lattice of any size from the com¬ 
plete distribution of the. lattices 1X2X2,1X2X3,1X3 X3, and 2X2 X2 
given before Tims lor 2X2X2, 2 X2X3, 3X3X2 and 3X3X3 
lattiee.s aic 7.o, 14, 20.87.7 and 47 23 respectively. Noav a, /3, 7 and S can be ob¬ 
tained by equating the geiieial expression for the fourth cumnlant to the values 
given above for the eorre.sponding values of Z, m and ?i and putting p = 3 = i. 
Tlie (‘qnations giving (In' values of «, /3 ,7 and 5 are 

I" 80, + I 2 O 2 + 00, + 04 = 7.5, 

I 120, -1- lOOs -f 70a + 04 = 14,0, 

(3.2 4) ’ 

I 180, -f- 2 IO 2 “f- 8 O 3 + O 4 — 25.8/5, 

[ 2/0, -|- 27^2 -}- 9 O 3 + 64 = 47 2o, 

^ This proof has been obtained recently and will be published soon 



212 


r, V KHI.MlNi lYKU 


, „ 32 X 1917(1 + rt „ --32 X 21(138 + 

where fli --- . Ih - 


03 = 


25() 

32 X 20952 -f 7 

2r)(r“ ■ ■ 


, aiirl III 


25(1 

'-32 X 1(112S -I- 0 
" 25(7 ' 


They give 


o- = -32 X 19143, /} =- 32 X 21(115, 

7 = -32 X 20940, and S 32 X ltU2H. 


TABLE 7 


Frequency disiribuiion of black-white joins, 2 X. 2 X. 2 kiliice for two colors 


No. of 
black- 




No. of black points 




Total 

wliite 

joins 

0 

1 

2 

3 4 

S 

G 

7 

K 

0 

1 

1 

— 

— 

— — 

— 

— 

— 

1 

o 

tU 

2 

3 


8 


I I 

I 

-- 

8 

— 

Ifi 

4 

— 

— 

12 

— 6 

— 

12 

— 

— 

30 

5 

— 

— 

— 

24 — 

24 

— 

— 

- ' 

4H 

6 

— 

— 

16 

— 32 

— 

16 


— 

04 

7 

— 

— 

— 

24 — 

24 

-- 



48 

8 1 

— 

— 

— 


— 

— 

— 

— 

30 

9 

— 

— 

—■ 

8 — 

8 

— 

— 

— 

1(5 

10 

— 

— 

— 

— — 

— 

-— 

-- 

— 

-- 

11 

— 

— 

— 

— — 

— 

— 

— 

— 


12 

— 

— 

-- 

— ■— 

— 

— 

— 

— 

2 



256 



Ki = 

6, 

Xi = 3, (cj 

= 0, 

= 

7.5 




Thus the general formula for the fourth cumulant is 


Ki — 2(648e — 671d 604c — 432) pg 

-4(9996e - 10857d + 1019Gc - 7G32)p®g“ 
-1-32(9144 c - 10167d -h 9732c - 7416)p’2’' 
-32(19143e — 21615<Z -f 20940c — 10128)^^?''. 


fnr ^ > • Ir in r dimensions, the fust two mo 

for the distribution of black-white joins for free sampling are as follows: 

/■q o c\ / _ . 


(3.2.6) 

(3.2.7) 


/ 

Ml 

M2 


2A,pq, 

2{A, 4- Bf)-pq -f 4(?rp'g“. 






PIlOHAIULITy UISTUinUTIONS 


213 


Like llie distiihuhon.s fur linear mid rectangular lattices, when I, m and n 
tend to infiuily, 71 and 7 -^ \sill tend to zero and therefore the distribution of 
black-whiti' joins for an I X rn X n lattice also tends to the normal form. The 
remarks mafic in ctmncction with the distribution of black-white joins for a 
rectangular lattice are true hiwe also Hero the frcquencic.? for 1, 2, [(Sc — d) 
— 2 ] and [(3e — d) — 1] black-white joins are zero, while for 0 and (3e — d) 

TABLIO 8 


Frequency dintrUndinn of black-white joine for 2 X 3 X S lattice for two colors 


No. of 
black- 






No. of black points 






Total 

white 

J 01118 

0 

1 

2 


4 

5 

G 

7 

8 

9 

10 

11 

12 


0 

1 

2 

3 

4 

1 

8 

4 

8 


2 

— 

— 

— 

2 

— 

8 

8 

4 

1 

2 

16 

28 

6 

— 


8 

8 

— 

— 

— 

— 

— 

8 

8 

— 

— 

32 

C 

— 

— 

24 

20 

8 

8 

12 

8 

8 

20 

24 

— 

— 

132 

7 

— 

— 

24 

48 

40 

40 

16 

40 

40 

48 

24 

— 

— 

320 

8 

— 

— 

2 

52 

81 

56 

68 

56 

81 

52 

2 

— 

— 

450 

9 

— 

— 

— 

40 

104 

112 

144 

112 

104 

40 

— 

— 

— 

656 

10 

— 

— 

— 

44 

100 

188 

160 

188 

100 

44 

— 

— 

— 

824 

11 

— 

— 

— 

8 

88 

144 

176 

144 

88 

8 

— 

— 

— 

656 

12 

— 

— 

— 

— 

36 

108 

162 

108 

36 

— 

— 

— 

— 

450 

13 

— 

— 

— 

— 

24 

88 

96 

88 

24 

— 

— 

— 

— 

320 

14 

— 

— - 

— 

— 

12 

28 

52 

28 

12 

— 

— 

— 

— 

132 

15 

— 

— 

— 

— 

— 

8 

16 

8 

— 

— 

— 

— 

— 

32 

16 

17 

18 

— 

— 

— 

— 

— 

4 

8 

20 

4 

8 

— 

— 

— 

— 

— 

28 

16 

— 

— 

__ 

_ 

— 

— 

— 

— 

— 

— 

— 

19 

20 

— 

— 

— 

— 

— 

— 

2 

— 

— 

— 

— 

— 

— 

2 



4096 




Kl 

= 10, 


Ki = 5 

> 

K3 = 0 

) 

Ki = 

14 





they arc two But this irregularity will not affect the limiting form of the dis¬ 
tribution since the relative frequencies tend to zero. 

3 3. First and second moments for the distribution of black-white joins for k 
colors in an r-dimensional lattice. The results for free sampling follow easily from 
a consideration of the expectations of the various ways of obtaining one and 
two black-white joins. The e.xpectation of the number of black-white joins is 

(3 3.1) 2 ArjhPi. 






214 


1', V. KU1MIV\ I\KU 


The expectation for two blti(‘k-\vliitf» jums i-i 

-BrPlMPl -i- ?):) -t 1 - /^r • Pi P> 

Trom tluK it \m 11 follow tliul the woniid moinonl 

(3.3.2) W “ 2.'Upi2)i2/;r/)i7»;f7»i 1 p-!^ 1- 4f'rpip? 

3 4. Fvst and sprond inomnilii far Hw diMrilniliiin af thr, liilal nnmhrr of joinn 
'beiwcm points of different colors for an I X m X « loilirf for threr colors. Tlicox- 
pGctation for fi'oc ,«amplinj; i.s 

(3.4 1) 2(3r - d)::,,rp. 

TABLf«; 0 


DislnhuUon of jams beiwcm points of diffirml colors for 1 hhud:, I irliilc and 

(Inm — 2) red points 


No. of joiIlB 


l’rfi[iiciK'y for liitticc.H 



2 X 2 X 2 

2 X 2 X H 

2 X li X 3 

3 X 3 X :i 

5 

24 

10 

8 


6 

32 

50 

HO 

104 

7 

... 

50 

U)f 

144 

8 


4 

90 

270 

9 

— 

... 

18 

112 

10 

1 

— 

— 


00 

Total ... 

50 

132 

300 

702 

about zero . . 

1752 

5410 

15778 

41130 


The second moment will involve terms in 'Sp.p ,, pip^p;^ and ^p^jil . The co¬ 
efficients of SprPi and are the same as those foi two colors, d’he coefficient 
of P 1 P 2 P 3 can be determined by finding the fretpiency distribution of pmis be¬ 
tween points of different colors when the lattice consists of 1 black, 1 white and 
(Imn — 2 ) red points. But this straightforward method is euinlier.soine, and 
hence the coefficient of piP 2 Pi has been determiued by finding the distribution foi 
the special lattices 2 X 2 X 2, 2 X 2 X 3, 2 X 3 X 3, and 3 X 3 X 3.1'liesc 
results are shown in Table 9 . 

The coefficients of pip^pa in the corrected second moment for the above lattices 
are obtained by subtracting 2(18e - lid -fi 4e) {2e. - 3) -f 8(3e - df from the 
moments noted above. This can be seen to be so by comparing the above ex¬ 
pression with the quantity subtracted from the uncorrccted second moment for 
a two dimensional lattice in section 2 4 The coefficients so obtained for 2 X 2 X 2, 



PRODAIIILITY DISTRIBUTION'S 


216 


2 X 2 X 3, 2 X 3 X 3, and 3X3X3 lattices are -336, -640, -1184 and 
—2142 respectivc'ly. Now tlie roeffieient of jhViV’i the corrected second moment 
is of the form 

«'c + fi'd -f- y'c + 

The equations ohtaitu’d hy equating this expression to —336, —640, —1184 
and —2142 for th<‘ respective lattices give a' = —174, = 108, y' = —40 


TAIILK 10 

Distribulion of joins hetu'em points of ihjferrnt colors when there are 1 black, 
1 white, 1 red and (bnn-S) green points 


No of joina 


Frequency for lattices 


2X2X2 

2X2X3 

2X3X3 

3X3X3 

7 

144 

48 

— 

— 

8 

144 

312 

288 

72 

9 

48 

480 

912 

1344 

10 

— 

432 

1344 

2664 

11 

— 

48 

1560 

4392 

12 

— 

— 

720 

4584 

13 

— 

— 

72 

3168 

14 

— 

— 

— 

1206 

15 

— 

— 

-- 

120 

Total. 

336 

1320 

4896 

17550 

'Sx'^fx about zero .. 

20100 

110208 

531312 

2370168 


and 5' = 0. Thus the second moment for a lattice with points in three colors is 

2(18e - lid + 4c)2p,p, 

(3.4,2) —2(87e — 54d + 20c)pipsP3 

-4(33e - 21d + 8c)SprP- • 

3.5. First and sexond moments for the distribution of the total number of joins 
between points of different colors in an I Y. m Y. n lattice for four or more colors. 
The expectations for free sampling are given by the same expression as for three 
colors. The coefficients of 2pr7 ?», and SprP« hi the corrected second 

moment arc also the same as in section 3.4. The coefficient of XprPiPtPu can be de¬ 
termined by the method described in section 3.4 for Hpip^pi from the frequency 
distributions of joins (Table 9) between points of different colors for 2 X 2 X 2, 
2X2X3,2X3X3 and 3X 3X3 lattices when they consist of 1 black, 1 
white, 1 red and (e — 3) green points. 




I'. \ KI!IM1V\ m.ti 


21G 

Till' cdflliciotit of 2;p,;),/(,/*„ m llu’ ••imoi'ti-l MTiutii iiiouiorif i-- tjlitjuiioil by ,Mih- 
tractmg (obtium'<l in tho «iiiiit‘ way as for tlio two dinicnHinnal lattici* in section 
2.5) 

(KlSr " lb/ 1- -bib - 2'r 

)-(3f - 8)12(-™87r b Md - 2()r) 1 SClr -- df] 

-8(3r - df 

from thc! uncorrccitod \'aUi('S. 'rh(* vabic.'^ m> oblaiin'il for the four latticeH ar(' 
48012X 2 X 2), 928(2 X 2 X 3), 173(i(2 X 3 X 3) and 31()H(3 X 3 X 3). 'rhe 
coefficient of prPtPiPu, as in other ciu<^es, beinp; of the form 

a'V + i9"f/ -b 7'V b rt", 

a", j3", 7" and 5" can bo. dctorniined by e(nialinn tlie above expression to 48(1, 
928, 173fi and 31(iS for the respective lattiee.s. ’The coeliicii'iit so oblaiiied is 

S(33r - 21f/ + 8^-1. 

Hence the second moment for free sampling wlien the lattice contains points of 
four or moie colons 13 

2(l8c - lid + IclliMh 

2 , -2(87o - ryld + 2()rl2p,/i./n 

~ l(33c ~ 2bi + 

+8(33c — 2b/ + • 

In general, it will be. found that the ciumilants involve term.s in c, d, e and an 
absolute term only Therefore when I, m and n tend to infinity and pidh,(>j • 
are finite, the distribution oi R - 2(3c - dlil/;,?;., where R is the, total nnmher 
of joins of points of different colons, lends to the normal form. Whim /, m and n 
are large, 

R - 2(3c - d )2:vrp. 

Ve. 

can be consideied to be normally distributed with 

(3 5,2) SQ^PrP, - 1742p,j)„p, - + ‘2{j^Zprp,p,pu 

as its variance. 

The distribution for non-free sampling here also tends to the normal form for 
the same reasons given for the rectangular lattice. As in free samphrig, for large 
values of /, m and n 

R — 2(3e ~ d)XCre, 

VI 

is distributed normally witli the variance 

-b 122c?e® - 242e,e.e,Cu , 



I'liniumuTV niSTiunuTioNS 


217 


whoro R iH tlip ()h!-f'rv(Ml iminhcr of joins for n given distribution of the points 

//r 

and c, = 'I’Ik* <‘rror iii thin .'ariance will he about 5% or less when Z, m and 
n are greater than H(i. 

Wc> may (uinelude Ibis seel ion by giving the. lirst and the w'eond moments for 
free Kimpling with /.- eolor.s for an r-diinensiorial lattice. 

(Jbli.r)) fj.\ 

(2.5.1)') gj - 2(.l, -1- I},)'^l)rjh 

i '2{Wr + ■iC'r)^Pr7h7)l 

■I l(",Zi?lr/d — SCr'UjUp.PtPu , 

where A, , Hr ami (\ are as defined m section 3.1 
'Flii.s can be .seen from tlie following fuets: 

(1) 'I'be e.oeffieieiits of ZprPt and are the same as for two colors 

(2) Tbe coeflieieiit of i.s the number of ways of getting two joins of 

(blTereuI, eolurs from eomhination of points not included in "^Prp.piPu ■ This can 
be had from three points of three differcnl colors close together and four points 
of three difTon.'nt colors separaled into groups of two each such that each group 
will give one. join. The niimher of arrangcinentH of the first kind is 3!5r. For 
the second kind it is 8(.l “ -h Or), rtubtrac.ting from the total number, the con- 
trilmlion of in the eorn'etion factor 4dr(Sprp,)^, the coefficient of "ZprPsPt 

in the second raomont works out to he 

2(3/lr + 40,). 

(3) The cioefTicicnt of ^PrP.piPu, as in all other ease.s dealt before, is twice that 
of '2plpl with an opposite, sign. 

Acknowledgements. The author's thanks are duo to Dr. D. J. Finney for 
Boggcslmg this problem and for all the faciHtie.s and help given to him in carry¬ 
ing out the investigations di.seussod in this paper. The author is also grateful to 
Mr. P A P. Moran for explaining the ro.yiilt.s of his investigations on this problem 
and for the interest, taken by him in the course of the research. 

UEPEJIENCES 

HI V. lloiiTKiBWicz, Die lieralwn, Berlin, 1017. 

[21 R, G. Bohr, "The piiUili numhei' iiuibleia,’’ Sd Cult , Vol, 12 (1946), p. 109, 

[31 P. V KitiHtwA [yEK, "The theory of probability distribution of points on n line," 
fndinn J.Ayric, dial., Vol. 1 (1948). 

[4] 11. LiiVENE, "A test of randoumcRB w two dimensions," Ann. Math. Slat., Vol. 17 (1946), 

p. 500 

[5] P A MonAN, "Ilnndom associations on alatlicQ," Proc.Cam. Phil.Soc ,Vol 43 (1947), 

p. 321. 

[6] P A. Moban, "Tbe interpretation of statistical maps,” J. Roy Slat. Soc , B, Vol 10 

(1048), p 243 



MINIMAX ESTIMATES OF THE MEAN OF A NORMAL DISTRIBUTION 
WITH KNOWN VARIANCE 

By ,T. W(n,Ynwn7-' 

('dnmhia Fmir.mbj 

Summary. It is provptl that the pla-^hical c.sl iinat imi priti’t'iluns fur the* niravn 
of a normal distribution with known viirhinco .'in* ininimax .soltiliimn of proporly 
formulated probleniH. Arcsult of .Stc'iii and Wald! 1] is animnu'dialcconscrpumcf'. 
Other such optimum proportiaH follow. Sccuu'ntial and luiii .‘•{‘(picutial problmius 
can be treated in this manner. Interval and point e.slinial inn an* dtsen.aHed. 


1. Sequential esUmation by an interval of given length 1. In this .section we 
shall consider the problem of .seeiucntially e,stiinatin(; the mean of a normal dis¬ 
tribution with known variance by an interval of lixed leni^th 1. Without lo.ss of 
generality we shall take the known variance to be unity, .'■'iieh a .seiiuential e.stirna- 
tion procedure, which we .shall designate gencrieally by A', is a nilo w’hich says a) 
when to terminate taking random, independent nb.s('rvation.s on the normal 
chance variable with unknown mean ^(— » < f < «) and variance 1, and 
when this termination is to occur after the oh.sf'rvations Xi, • • ■ , have, been 
obtained, gives b) the center of the estimating interval of lt‘ngtli I as a function 
ofxi, ,x„ Leta(f, G) bethoprolmbilityunderfr’that the, e.stimating interval 
will contain and let ?i(^, G) he the expected number of observations when ^ i.s 
the mean and G is the csUraatioii procedure (It is a.asumed that G is such 
that a;(f, G) and n(^, G) exist for all ^), 

Define 

?(?. G) = 1 — a(?, G), 

and for fixed c > 0 


(Id) W(^,G) = q(^,G) -hcn(^,a). 


Let C {N, 1) (I > 0, N a, positive integer) be the classical non-sequential estima¬ 
tion procedure where one takes the fixed number N of observations, and estimates 


the mean by the interval 



where x is the sample mean. For p 


such that 0 < p < 1, let C (p, N, 1) be the following estimation procedure: A 
chance experiment with two outcomes, N and W -|- 1, of respective probabilities 
p and 1 — p, is performed. One then proceeds according to C{i, 1), wlierc i{ = N, 
IV -t- 1) is the outcome of the experiment. Finally define 


M{y) = 


V2: 


L. f“’ 

^2ir it/ 




dz . 


‘ Research under a contract with the Office of Naval Reaearch 

218 



MINIMAX ESTIMATES OF MEAN 


219 


Let UK fiKKume for a moniont that the unknown f is itself a chance variable, 
normally rlLsI ributcfl with mean zero and variance v, and let us obtain a pro¬ 
cedure It which niininiize.s 


(1 2) K{q{i, (i) j- r 7if«, t/jj 


--L- r 

\/2to' J » 


(<'/(?/) (r) 4- cn(?/, (r) j exp 



dy 


Lei .Cl , • • ■ , J"„ he m indejiendent observations on a normal chance variable 
with nieaii t anil variance 1. fa-t 


J* 


Z.t. 

1 

m 


The. a posteriori distribution of {riven Xj, • 
[1], eii-s (lit) and (20)) to he normal with mean 


( 13 ) 

and variance 

( 1 , 1 ) 



x,,,, is easily verified (or see 


Thus if we .stop after m olKservations the best procedure from the point of view 
of miniinizitifr (po) is to put the center of the estimating interval of length I at 
the point (1.3). 'I'lie conditional expected value of 5 (f) is them 

(Lu) Q(a'i, • • , Xm I v’) = ^m + . 

Thus Qixi , ■ • , .(•„,) i.s a function only of m and Jlefme 

(1 (i) R(m, a=) ^ 2,1/Q ^m + ^m+l + j) • 

We note that, R{m, o-') i.s, for fixed v, a decreasing function of m We conclude 
that a bc.st decision as to whether or not to take another observation must be 
based on fhe value of K{m, a“). If Rim, v“) > c take another observation, if 
Him, ir~) < c do not take, another observation, if 2?(m, o-“) = c take either action 
at pleasure lienee, if r is such that liiN, o-**) < c < RiN — 1, cr“), a best pro¬ 
cedure from Ihe point of view of minimizing (1.2) ].s to take exactly iV obser¬ 
vations. 'Diis inleger N is a function of c and a^, thus: Nic, cr“). In the next 
paragraph we shall show that Nic, a“) can bo defined for every positive c and a-\ 
It, is clearly a function which lakc.s at mo.st two values. Wo shall denote by frCo-") 
the estimation procedure described above which minimizes (1.2). It consists of 
taking the fixed number Nic, vO of observations and putting the center of the 
estimating interval of length I at the point (1 3). Where Nic, is double-valued 
we may lake either value at pleasure. We verify that the value of (1.2) is the 
same for either choice. 



Wo now verify tlial X(r, ff’s chu In* ilHiiu'i f'H oil <* r !sn*i <>’. We liavo 

remarked earlier Hint l{(m, rr^) in, for iiv**! n, a jn*in*it,,jui'ri!ly (ieereadti^ fiiiic- 
tiion of m. Wt* no)e tliaf 

liiii I{<m. fT"» n. 


When c > fi(0, tr‘) wo take no oUsorvulioni wli.i(>'-,Kr nnd take .r 0. When 
c= wotakRJsorooronoithM'rvaliotial iiKaono. 

Without dillioiilty wo romputc 


W(e, Gi^^)) = ir’) = cN + ,! 


1 1 


f 


1 

X,j 

I 


•u (x'W .! 


L 



I 1 




where for typographical siinplicitywehavowriHoii X for A'lV, tr'). KorfixiHl c iintl 
the minimum of lV(t, v) occura al £ - t). AI.mi H'O), <t‘) in a inonoliniioally 
increasing function of <r^. If N{c., ”) > 0 Ihoii, aa o'* -» t. it approachi'.s tlie limit 


cAf(c ,«) d- 2M^.| VAXc, 

which is the constant value of 


CX} j 


C(W(c, 00), 0). 

We therefore conclude that C{N{c, «), 1) is a minimaK catimatinp; procedure of 
type G, i.e,, 

W(£, C{N{c., to), I)) = inf sup W(£, 0} 

0 ( 

for any c > 0. (The case X(c, <») = 0 may he verified separately. AVi* del’me 
S = 0 for C(0, i)). 

Conversely, let No be a given non-negative integer. T’hcu G(Xo, 0 ks a minimax 
estimating procedure G for all W(£, (?) for which c satishe.s 


R(No, to) < c < ]i(No - 1, TO). 

(We define R(~l, to) = oj,) Thus we can say: Tor every c > 0 there exists a 
classical estimation procedure C(X, 1) with integral N such that 


W(?,C(1V,1)) 


inf sup W(£, (r), 

0 E 


For every integral N we can find at least one c > () such that the, above eijuation 

1 e a ove. (We have taken the liberty of calling C(0, 1) a classical procedure. 
Jjet ao be a given number such that 



MIMM\\- i;f-TIM\TKS Of MKAN’ 


221 


Ppfin(‘ P(i, 0 < 7)n < 1, aiifl a ptnaitivT int^-gral .Vo imi(inely by 

- Vn (l - 2.u(v'Vo ^)) + (1 - po) (l - 2\f(yN7Vi 2^)) . 
Let 


rfl /i(iVo, ot). 

I'’()r fi == Co WR verify rearfily that botli a(,Vo , 1) and C(No + 1, 1) are minimax 
eatimuting procwbirea (i, ao that 

W(^, C{N,, 0) - IF(f, r(.Vo 1, 1)) 

- Tk Wii, CiN ,. 1)) + (1 ~ ?5o) lF(f, C(Vo + 1, 1)) 

(1 — ao) + ro[p(( Vo -h (1 — pti){Na + 1)] 

- (I - ao) -b Co[Vo f (1 - Pi)]. 

Therefore, for any (I whatever, 

(1 - ao) + e«[Vo -t- (1 " /^)] < Hup {^(f, G) + Co »(f, (?)) 


< sup (/($, (?) + Co sup nit <?). 

Hence 


eup fjit (?) < 1 “ ao 
£ 

implies 

.sup nit 0) > Vo + (1 — po), 
£ 


a result, first proved liy Stein and Wald [1]. 
Also 


sup nit (?) < Vo + (1 — po) 

t 

implies 

sup fjit G) > 1 — do , 

£ 


a result also proved in [Ij, 

2. A sequential upper bound for the mean. The fact tliat in the last section I 
was a (jonstant made matters simpler, as we sec when we begin to consider the 
pioblcm of a sequential iqiper bound for ^(— ■» < ? < “). This of course means 
that wc wish to use as estimating interval the interval (— oo, L (si, ■ • , x„)) 
where L is a function of the observations xi, • , a:„ , and n (a chance variable) 

is the number of observations befoie the process of taking observations is termi¬ 
nated. What is Avanted now is a suitable definition of the “length” of this in- 



222 


j. w«)Wtiwn/- 


terval. Also we shall a<lriiit Ihe pi>s‘,iliili{y that it nii|t;ht l)t!insoiui<M'ij,-(‘aflvan- 
tageous 1,0 have intervals of varying length; thi,'. ptht*,-, the innhleia of oiilimnm 
choice of the function A(.ri, • • • , 

As before, let ? be the mean of a normal disfribufinn with utiit viuiance. Let 
r be the generic estimation procedure which eonsi-^ts of a rule for leriuinating the 
taking of obsorvatioiiH, and uf a fiuielion AtUi , , J",,) whicli iiii^^ed toesti- 

raate ^ by the interval (~ Lr). Ueline 

5(L T) = /MAt < tl. 

X(L f) = K{Lr - 

and 

(2.1) Tf(L T) = q{k. T) + /cX(£, T) -1- rn(,L T), 

where c and k are positive constants. (\V« admit only wieli T for wliieh the cpiun- 
titiea q, X, and n arc defined for all real td As hid'on', let us femporiinly as.siuue, 
that f is normally distributed with mean zeio and variance and set our.selves 
the task of minimizing 

(2.2) £ Wiy, T) c dy - ir*(r, cr*j 

with respect to T In the next paragraph w’c digreas for a mommrt to derive a 
needed elementary inequality. 

Let us prove that, if h, hi , and hi are nou-negative, and 

(2.3) h^ = p + (1 _ p) /ij, 
where 0 < p < 1, then 

(2 4) M(/i) < p MQii) + (1 - v) M{hi). 

Hold h and p fixed. The desired result is obviously true when hi = ht = h. Let 
Jh and hi vary, subject to (2.3). Then 

dhi _ —phi 
d/ll (1 — p)/ij ’ 

Also 

pdMQii) -p 
dhi ^ ■v/2ir” 
and 


, dM (hi) _ dM{hi) ^ phi _j,,: 

dhi ^ dhi dhi "s/ 27r/i ^ 


Thus the derivative of the right member of (2.4) with respect to hi is 0 when 
hi = h, positive when hi > h, and negative when hi < h. From this we get (2.4). 



MINUKX ?;bTIMATKS OF MEAN 


223 


Lot T bo suiy oKfimatioii prooeduro and hrixi, ■ • • , x„) its associated func¬ 
tion. Write 


irU'\ , ■ • ■ , .fn) 


- r 1 T' 

LAXx > • • • 1 .Fn) ~ X 1 -|- -r, 

n<T-_ 


If 11 m and Xi , ^ , x„ i.s the .sample obtained, wc have that the conditional 

C'.'Cpoolod value of cr^) i.s 

( 2 . 0 ) M(^lrixi, ■ • • ,x„) ^m + + cm + kE(Ut + lr(xi, ■ ■■ , 

where U* is a normally dislributeil ehanre variable with mean zero and variance 
(^n + 'i'ho last term in (2.5) is therefore 


k IriXl , ■ • • , Xm) 


This IS an even function of It ^ while the first term of (2.5) rs a monotonically de- 
ercasint!; function of It ■ 'I'hu.s (2.5) and hence \V*{T, a) will be minimized by 
takiiiR It non-noj'ative. Xow take the expected value of (2,5) over the set of 
samples where a —- m. Application of the result of the preceding paragraph to 
the linite suims which uiipro.xiinafe the integral gives the result that W*{T, 0 is 
minimized when /r(/i, • ■ , Xn) is a function only of m. Hence we may restrict 
ourselves to coiihideratiun of iiruccdurcs T for which (2.5) takes the value 

(2.6) m + \ lr{»o'^ + cm + k • 

For any such procedure. T, since k and c are fixed positive numbers (and a is 
held lixed for the jire.sent), the expression (2.6) takes its minimum for some 
value of VI. Thus, in our (piest for a procedure T which will minimize W*{T, cr^) 
we may rc.slrict oursclvi's to proccdure.s of fixed sample size. This fixed sample 
size and the (con.stant) value of It arc functions of k, c, and cr^ For fixed m, 

m -b ^ 4- Mffi 

lum au ah.solute niininnmi at Im , say, since it is a continuous function of ((1 > 0) 
which apjiroache.s os with t. Tlie ciuso m = 0 must he considered. (In this event 
X ~ 0.) Now consider the seciuence 

"I" 0^' } 

for m = 0, 1, 2, - ad inf. This sequence condenses only at m. Hence there 
exists a value N(Jc, c, cr^) of ni for which the elements of this sequence have a 
minimum value We may choose N(]c, c, </) so that lim, 2 =oo N{k, c, c) exists. 
(We verify easily that this is always possible.) Designate this limit by N{k, c, °o), 



224 


j. wtiU'iiwir/; 


aad the associated I by i(i-, c, x /. The I tih.iK’iafed with Xd-, r, < 7 *) will fw dcsig- 
nabed by l(k, c, o-*). Thus a bfsit pnwiHlim' for juniimiJiiiiR W*iT, < 7 *) i« to take 
the fixed number N{k, c, <r*) ubfiprviition!«. and iipiMT htiund fr.r the 

quantity 

We see readily that 

Hk^ r., 'A) lim lUc, r. n't 

and that 

l(Jk, 0 , »)) Ihn r, tri }- \ Kk, r, a^ij . 


1 + 


„ j ^ 

(rW(F, r, <f"b 


b kl, r, n }. 


Let T in) be the procedure described alnive which is a bast luoewlun* T in the 
sense of minimizing W*{T, n) when n in the varinnee of {;. 

We now compute W(5, Tin)) and obtain 

r Na* / e 

Wtt. T(<7^)) = cN + k -4- (f _ 


(2.7) 


. i' 


1 "H Afff** 


where tor brevity we have written N and I for NQc, r, v*) and lik, c, a*). I/tt 

1 L± 


I 


1 d- Nn 


i == a;, 


Vn + 


Then 

( 28 ) 

(2.9) 


W = cN + k 


[(V^' 


^ = 2kx - 
dx 


+ «) 
iVN + t) 


+ x* 


+ Af([v'W + t] x), 


exp [ -^(V’A/ d- 


The second term above is always of the same sign and the exponential decreases 
as 1031 increases. Thus dW/dx= 0 has the unique positive root x*. Put x* for 
0 : in TF (in 2.8) and call the result W*. W is a continuous function of x and ap¬ 
proaches 00 as 1X1 —> 00 Since the root x* is uniiiue it follows that TF* is the 
iniiumum value of TF with respect to x. Now Nik, c, cr^) is corustant for v’ suffi¬ 
ciently large. Hence, tor such a”, we have 

- (vffl? + 2*“* f [- K ( VM + .)■»•' 11 


:^e>;p [-i((Vw + ,)=*<■■)] 


-2k 


iVN + ey~ hH(VN d- e)^x**}l 


( 2 , 10 ) 



MIN'IMAX KSmLA.TES OF MEAN 


225 


Since X* in the root of 5Tr/5x = 0, Also t ia positive and, for aufl&ciently large, 
approaches zero monotonieally os <r® approachea m. For e > 0 we have that 
dW*/Se < 0, sinoe x* > 0. We eonoliide; For o-^ sufficiently large, 

rain Wii, T{x^)) 

increases monotonieally with cr* and approaches 

cN + k + MiVN'x^ik)), 

where N is short for N(,k, c, «>) and Xy(fc) is the unique positive root of the equa¬ 
tion in X 

2kx = exp [— 


Going back to the definition of l(k, c, £«) we see that the latter satisfies the equa¬ 
tion in 1: 

'L{M(VNl) + kl^ =0. 

Hence 


xnik) - l(k, c, «) 


Thus the classical estimation procedure Go where one takes the fixed number 
Nik, c, oo) of observations and uses as upper bound for the mean x -f l{k, c, “) 
is a minirnax procedure T, i.e., 

t^o) = inf sup Wi^, T) 

T £ 


For lixed N, XNik) decreases monotonieally from + ra to 0 as fc increases from 
0 to + . Hence, for given positive integral No and I* > 0, there is a unique 

positive value ko such that Xif,(ko) = I*. Consider the expression 


( 2 . 11 ) 


B(m) = Xm(ka)) + cm + Ao 



where m is a positive, continuous variable. We have 
dB{m) ko dx„(Jco) d 

—T- — = c-r + 

, dm nr 

( 2 . 12 ) 


dm 


dm dXmiko) 


M{\/m XmOco)) + /(:o[xm(/co)] 


+ 


5Af(Vw XmQco)) 


etm, 


The third term of the right member is identically zero because 
(2.13) 2fcox„(fco) =(^^exp {-4m[x„(fco)]'}. 

v2ir 



J. WMl.lilWU/ 


FiirLher wp luivc; 


rPBvn) '2! , -i m i' 

— , - , ■' , r * 

dm' m' ilm 2\'2x 


2r. h.iiim 

M’* dill 

Foi lypographit' himplinfy we .Miiill uh* y (nr /,;<»'■. i in llic nf ilit> 

next few linen. From ('2.1111 we ulititia 

loK 2fco 4- loi? y -- -loji; y^'Jw i 1- Iok m - 5 m y^ 

I di/ J _ V* 

II dm ‘2m 2 dm ' 


dij _ j/(l ~ mip 
rfni 2m(l I- 


Hence 




^'<"11 oi .-3 1 I. -4 5 ,>1 -1 du 

dm- dm 


= 2/'o ni * 4" ^'11 m * ,y'* — 


/.■ai/’d mip 


Since c > 0, we have 


= 2/co«r* 4- " 

m.(l + mir) 


liniH(ni) = lini H(m) -= 4-'*' 

tnwQ nuHtCO 


Hence there exists a value of m for which H(m) taken its luiniinuiu value. If in 
d B {m)/dinviQ putm = No and set the r,e.sulliiiK e.xpressiori e(iual to zero, was 
obtain an equation inc whose unique solution Co , if it i.s positive, aKsiuebusthat, 
when c = Co and fc = fcg, B (m) takesits minimum at in - .Vo, X .simple (toinpu- 
tation gives 

(2.16) + 

"o iV'ZuNo 

Actually we are interested in considering B {m) only for jiusitive integral values 
of m. We see readily that the minimum of B (?n) occuns tlien at m — No when 
c is such that 

Cl (JVo , /co) < c < C 2 {No , ko), 
with Cl and cj roots of the followmg equations in c: 

B {No) = B {No 4~ 1), 

-B (No) = B (JVo - 1). 

(If No = 1, then co == oo.) 



MIXrM\X i;sTIMVTJ-.S OF MEAN' 


227 


L('t T'o C.Vn, r,! tif flu* fnon-soqupntial) procedure where one takes 

jVo (thservations and UJ-fs > * I* a.s ii[»pf'r hoaufl for the mean. Choose h = fco and 
r, wieh tliut (2.IT) i-^ Mifidiod. Then 

U'fl, f, ni rXu -h d- M{VNo I*) 

identically in ("nCVn, t*) is a [trttc(‘dure 7'.siu'h that 
(2.1S) irff, f'„) ■= infhupir(f, D. 

r £ 

Wlu'uever c and I: are mven, the N and I of the minimnx solution may 
he (ihtiuned as fnllow.s: First we obtain an integer M such that 

fiCV, /.•) < c < C2(*Y, k). 

Knowing A' and k we cun thmi .solve for 1. 

The re.sults of this section may ho .summarized a.s follows; For every positive 
c and k lh(‘re e.xisl.s a c'la.«.sieul eatimalion proeeduie CoOV, 1) w'ith positive integral 
N and I > 0 .such that (2 18) hold.s. Convensely, for every such pair (N, 1) there 
exi.stH a i)o.siti\'(! pair (c, k) so that (2.18) holds. A method of finding one member 
of the pair of eouples fc, k) and (.Y, 1) when the other i.s given, has been indicated 
above,. 

bet Ti bo any proet'dure for giving an upper liouncl for f. We shall say that 
2'i is optimum if for any other {irocedure sucli that 

«I1P (?(?. T,) < sup a(e, Ti), 

sup Ti) < sup X(f, Ti), 

{ f 

we have 

sup 7!.(f, Ti) > sup n(f, Ti). 

£ £ 

It i.s easy to prove that the classical procedure Co with any positive I and positive 
integral N is optimum by u.sing the results of the last paragraph. For let 1 — a = 
M (l Vn) and let k and c be the corresponding parameters. We have then 

sup q{t Ti) + k .sup X(S, T.) + c sup n(^, Ti) > sup {g(f, Ti) 

t t t * 

+ k X(?, Ts) + Ti)} >(!-«) + ■ 

vSince .sup (?(?, Tt) < {1 - «) ‘md sup X(f, Ti) < l/N + l\ we must have 

sup Ti($, Ti) > N, 

f 

which is the desired result, _ j - i 

In a general imprecise way we may say that an estimation procedure is the 

better the smaller the three quantities 

/3dT) = sup e(f, T), Pi{T) = sup X(£, T), ftCF) = sup T). 
e f ‘ 



228 


j. wdi.mwrrz 


"We can now assert the followinR: No scimential iinK-f'diirc T can be suiierior to 
the classical fixed sample proeedrire. C in the seiiMr* that 

Mr) < fi.(b') fon: --- 1,2, 3 

and the inequality sign holds for at least one 7. 

In concluding this sectum we may rennirk tluit the ctiM* a < ), i / < 0, 
may be handled in the same manner a.H above except that W’e use \/m) 

in place of Al(l Vm). 

3. Miscellaneous results; point estimation. Without going into tlu' neces¬ 
sarily involved details, we content ourselves with pointing out that the, problem 
of estimating sequentially the mean of a normal distrihution by a finite interval 
of length not specified in advance, can he solved in similar fashion. As before 
let ^ be the unknown mean of a normal distrihution with unit viiriance, wheie £ 
may be any real value. We, want to estimate, by an intiTval 

, ' » • , Xn), DjfXi , * • * , Xn)}. 

Let c, fci, and h be positive constants and consider Ihe jirobltun of minimizing 
the Bupremum with respect to ^ of 

1 ~ P(Li < J < /j 2 I fr'] -b cn(^, fr') 

+ hi liKLi - O' i b'l + h.mi, - ! b'‘l, 

where is the generic designation of the, estimalion jirocedure. .‘Vs bi'fore, emiiloy 
an a prion normal distnbution of ( with mean z(’ro utul variance o-^, imtl let 
(T —1 CO, A fixed sample size procedure will he a inmimax solution, It will poaseas 
optimum properties similar to those described in 1h(* precisling see,tious. The 
problem of minimizing the supremum with respect to ? of 

1 - ?{Ai < £ < Ls I G'l + cnQ, G') + /.A’| (L, - L,y | f, G* j 

can be treated similarly. 

Suppose the sample size is fixed in advance Theproblem of liiidiiig an estimate 
which will minimize 

sup[l - P(Li < ^ < L, 1 G'j + hEl(L, - ^)= 1 (;’l -b | G‘]] 

or 

sup[l - PlLi < J < Lj 1 G‘) + kE\iU - Lrf 1 e, G'}] 

can be treated by the method of the preceding .sect,ions. 

The problem of estimating (sequentially or with fixed samjilc .size) the means 
ot a multivanate normal distribution with known covariance matrix can he 
treated m similar fashion. 

Suppose it IS desired to estimate sequentially the mean ^ (_ oo < g < oo) 
0 a normal distribution with unit variance by means of a chance point 



VUMVnS (iV MKIV 


f (o , • ■ , r„). r.' t I<! i. i. )■>* Jh»‘ W'.iM ri‘-k fdiK’linn (rf, |'i|l, a non-iu‘p;afiv(’ 
fiinctiiin x\lji('h iinwiirt flu* In ,■ iiioun-il m uhkk {larlinilar vulm* as an 
(‘.sliiiiatf whfii ( I*. Iln’ :«i lual vali3<*. Thf funcfinjiN ^ f ri , • • • , x^) atul /t’(f, f') 
rniisf have MufaM*' nu a airahilify iirnjH'rfH'!* fnr which we rcfar the reader to ('i] 
Let 11'' .‘■cck a procidiirc t* 'Uch that 


i".: 

{ 


fu>t 


inf j)| !“ r n(^, |}], 

? t 


Here litSi (■) ih the average iiutiihtT nf nhservafiotis under i when ^ is the “truo” 
mean Tlic pruciHiuic will iw called a ininiiiia'C wilution. We shall assunie that 
I{{a, h) IS a moruifoiiically iinii riccrc.’tMiijf fiincfion of | a — b j , and that (here 
exists a jiosjtive nmiiher // Midi that 



tdl, j 1 exj)- 



ii 

) 

/ 


ih 


^ . 


As examples of funetiotis with these properties we may cite 


HUt, b) i a ■- b I, 
b) - (a •- b)'. 


As hefore, a-ssume temporarily that. $ is nonnidly distributed with mean zero 
and variance We verify without ditliculty that a solution ^ = fo which 
minimizes 

is the following ITT is idenlieally a suitable constant, say iViand Jo isx(l 4-1/ .W'lr”) ^ 
= xh say, so that A < 1. For this solution wc have 

fo) ] -f cafe ^o) = ciV 4- ^ Sh) exp (5 - f) j d£. 


Write n = X — Then 

xh) = /e({, h [| 4- n]) = li(0, hu - [1 - A]f), 

Ii((, xh) exp I - ~ (x “ f)®| dx 

f iVw^l 

m, hu - [I - A]f) exp <- -^1 dw 

= £ E(0, v) exp ^ ^ dv. 

Because of the assumptions on the function R the last expression is^ a nnnimum 
when f = 0. We may always choose iV such that, for large enough a , the integer 
A/" is a constant, say No ■ Also A —> 1 as <r* —> “. Thus we conclude that the follow- 



230 


J. ^W)t*K)\U!7, 


ing is a minimax solution: n -■ A’l and i T ^ any r4iinafi(in proi'orlure 
5 is sucti that sup ‘(j < tlif‘i\ 


Iff is such that 


then 


( 


m\) E\imi\ nmid% 
( 


Hlip 5) '> A'O. 
I 


If the restrictions imposed ahovi* on H aro sati^lied ami if the .siiinple must 
always be of given size E, the above argiimeiil still holds whim 1;.V ij, and 
shows that the estimate I miniiuizes 


with respect to f. 


mpEimi f)l 

t 




[1] C. SmiN AND A, Wald, ‘'Sequential conlideneo iiiti-rvals fur (he tiii'an iif a luirmal 

distribution with known variance," Annals of .IMIi, Vol, IS:III 17), pii.t!!/- 
433. 

[2] L Wadd, "Statistical decision (unctions," ianula of Hlaih, ^Slni,, Viih 20 (ItllH), jip. 

16&'205. 



ASYMPTOTIC PROPERTIES OF THE WALD-WOLFOWITZ TEST 

OF RANDOMNESS 

By OtKiTKutMt Xn],rmm 

Xtu' York {'mrir'itli) 

1. Summary, 'riic (-(‘riuiii iisyuiplutir jiropevties of the 

tc.st (if i!iinliirtini',\s I.k.mmI nu ilii« IX 2Z"-i '■('‘k'i proposed by Wnld 

and Wolfiiwilz. It ^ll(lvul that the nniflilunih piven in the oripinal ])aper 
for u.s.vmplulii' innimdity of /i% uhcii the null hypolhc.ii.s of nindomnc'.s.s is 
Inic ('.'in ln' Wcnkoiicd i’(in*‘idcr:ttily. Ooiulifimw un> pivon for tho (!onsi.stency 
of till' test when under the nlfemative liyjiolhe.d.s (-oii-seoutiva observations 
arc drawn mdepinidently lioin elicinpinp ])opnlafion.s with contuuiou.s cumulative 
di.stiihufioii fmlct^on^. In particidar a dowmvaid (upward) trend and a regular 
cyclical nu)\(‘incut art' con.'«iiU'icd. For the .special ca.s'c of a regular cyclical 
inovcnicnt of known length the asymptotie lelntivc elliciency of the test based 
on ranks witli rcs|H'ct to (be ti'st lia.scd on original obst'rvatioiis is found. Asimple 
condition for the asy mill otic iiorinalify of IX for rank.s under the alternative 
hypotht'iiis i.s given. Tin’s a.syniptotic nornuility i.s u.scd to coiniiarc the asymptotic 
power of tlie Ah-lesl with that of tlie Mann 'i’-te.st in the case of a downward 
trend. 

2. Introduction. The hypolhesi.s of rundoinne.ss, i.e., the assumption that the 
chance variuhlcH .\’i , ■ ■ • , AT have the. joint eumulative distribution function 
(edf) Fixi, " ■ , .r„) ■■ /'’(.ri) • • • FM where Fix) may lie any cdf, is basic in 
many stati.stieal iirobleni.H. Scweral ttests of randornrie8.s designed to detect 
changes in the undi'rlying poinilalum have been suggested, however mostly on 
intuitive ground.s. Very .seldom luas the actual performance of a test with respect 
to a given chus-s of alternalive.s been inve.stigated. It i.s the intention of this 
paper to carry out such an invcsligalioii for the particular test based on the 
statistic 

n 

lih ” y 'i 01, Xi-^h t 01; , 

i-l 

proposed by Wakl and Wolfowitz [Ij. It is suggested in [1] that this test is 
suitable if the alternative to randomness is the existence of a trend or a regular 
eycbcal movement. Both these coses will be treated. 

Let a,, • • • , a,, be observations on the chanec variables Xi, ■ ■ • , X„ and 
assume that, the liyiKithesis of randomness is true, (Henceforth this hypothesis 
will be denoted by Jh while the hypothesis that an alternative to randomness is 
true will be denoted by //j.) Restricting then Xi, • ■ , X„ to the subpopulation 
of permutations of ai, ■ ■ ■ , a„, any one of the n! possible permutations is 
equally likely, and the distribution of Rh in this subpopulation can be found. If 

231 





' '<* .sn . 


fhi" rif .‘<i|t"iis!sf jii* * .j j 
IHfeitive in1''U‘‘r. ih*' !«■;•? •»> 
ffn and //• wi»>i3 

vahir-j*. T1 h‘ partn-ulur <' 

tlw* pmvf'rof tlif* twiMnsfU t*’. 3<«‘ - .. .... .... . . — 

Drruitp thr* r-isiwtMl %,dn^' and v,iT!;in>j >d in "!<'• ■.diicijril iHimu tif injnally 
likely pemintalimii’ nf n t,}i^frv;dv!n'. n,. . •*, i»*. and 1 7i\ . 

lively. Then H its elmwn in ll] ihtif if ft n* iuma- «'i n 


11 m • 

;.b : 

S 



n’ w}i'>re m is a 

'1 'S.i' 

i’ *■' ’■ 

I ‘mv 

»■ rii 

•' 7/3 

ibU' vahnw of 

a _„ It 

4 ’ ' 1 * 5 ) 1 » 

*-!i* » 

“iv 1 

'.ib.* “1 

r 

■ 'dU' nf tlwwi m 

ri**' < r,‘ ' 

",d V. 

■d'!* 

!, • .bj * 


i( n,-* to iii.'ixiinm’ 

1 K.t {l.c 

' 1 1 

-•f.ir 

' Tin’ 1 k, 

nii'l 

*'! ulornfion. 


(2,1) 

and 




ff h\ 


- {Al 

n - I 


H ■ 1 


i; .1-' 


/ 1 4 


( 2 . 2 ) 


+ 


1 


(n ~ l](?i -• 2j 


■ f.l{ ~ -l.lLlj f '}.l,.l, I a: ' 2.1, 




01 i 


.Id*, 


where dr “ al H.f , (r .. I, 2, H, A). Ai-tunlly ,2.1) and (2.2) are valid 

as soon as n > 2h. 

Let RI « (/4 — i‘fRh)/\/V^Ri ,. Then it in tdwi elinwti in (1) flint if h is prime 
to n, RI is asymptotienlly nomially distrihuUsl with mean (' ami variance 1 
provided the o,, (i '= 1, • • • , n), satisfy comlitum It': 


i i (a. - fiV 

7i t*^l 

- n)*r 

L n i-j J 

where a = 22"-i o,. 


0(1),* 


(f* ' 't 1 *1 1 * * ‘)) 


It is easily seen that condition W is satisfied when the original observations are 
replaced by ranks. Wlien the a,, • • • , a, are independent observations on the 
same chance variable X, condition IT is satisfied with probability 1 provided X 
has positive variance and finite moments of all ordera. It is interesting to compare 
tks condition for asymptotic normality of Ri, in the population of pennutations 
of observations on the chance variable X with the condition for asymptotic 
normality of R^ under random sampling. For this case IIoefTding and Robbins 
f ^li^t it is sufficient to assume that A' has a finite absolute moment 

of order 3. Thus it is desirable to weaken condition IT. This will be done in 
bection 3. 

In further section s the consistency and efficiency of the test based on Rh will 

Sep '^nr Symbols o and ~ to bo used later, have their usual meaning, 

oee, lor example, Cramdr [2], p 122 



I'l.ST ni' IKXnoMNKSS 


233 


bp pxamincfl axMimiiiK Hint iiiulpr Ihr altf'rn.'Uivp liypothoHis olisprvatioas 
thfiiipli Ktill inclppcruipnt, arp drawn fnini pliaiijrinR popiilatinriK. Throughout the 
paper the eirnilarly dptineil .statiatic la usetl Howt'vpr, if witli probability 1 

d- + TaT/, =- o(/l’^), 

it in f.ppn that uayiniif-otically the feat huwii on tiip non-circular 

2 -r.ur.+ii 

has the sarni' proiH'rties !rs that baatKl on . We find 




H — h 
n{n — 1) 


(-i4l — /Ij), 




n ■ 
n{n 


- h 
~ 1 ) 


- rU) + ---.M-- (AlA, - /15 - 2Hi4a + 2Ad 


, (h__- h ~ _l) (a - h ~ 2] + 2(/i - 1) , , 
n\7i — IKn - 2j(a — 3) ' " ^ ^ 


GaIijIi -f- SAjAs -j- 3Ai — GA<) 


( n - hy 

n“(ra — I)** 


(A? - A,)^ 


3, Asymptotic normality of Kh under randomization. Let the set of chance 
variables Xi, • * • , -\'n Iw! defined on the n! equally likely permutations of n 
numbera SU = (oi, • ■ • a„). I'hen we have 
Tiikoukm 1; The didribution of Rl lends to the normal dislribuhon with mean 0 
and variance I asn —* oo provided 

t, (a. - ay 

(3 1) -(r = 3. 4, • ■ ■), 

[s - ■'>’] 

n 

where a = n~^ 2D ®> • 

Remabk: The set 3 (b need not be a subset of S(„+i. 

The proof of this theorem will be omitted, since it is very similar to the proof 
of another theorem by the author [4]. 

Tiikoeem 2: If the ai, ai, • • ■ arc independent observations on a chance variable 
X having positive variance and a finite absolute moment of order 4 -|- 5, 5 > 0, 
condition (3.1) is satisfied unless possibly an event of probability 0 has occurred 
The proof of this theorem will be based on Markoff’s method for proving the 
central limit theorem in the Liapounoff form.“ Thus we shall show that there 
exists a sequence of sequences i8„ = (6„i, • • ■ , bun) such that unless possibly 
an event of probability 0 has occured, (i) there exists an index n' (depending 


* See, for example, Uspensky [51, pp. 388-95. 



231 


fifriirtni It I. '.tt> tnni 


on the Kivnn minciu’ft Mirli ihaf f<<r u > , .'snd Sii ' tin* 41^ 

satisfy conclitinii (3.1' (‘\[in'hM(l in f^Tin* t<l tin* li„„ , 'i 1. i. 

It 18 no mstru'tifiii to awiiiiif that KX ft, r-iiii’f tlw (uMition of one and the 
same oonstnnt In evriy », dot*'*' not chniif'e (3.1 1 , !,»*? 

.Y Y(«'i 

and define for f “ 1, ■ ■ • , h 

lip,,* (I, iff!,* Vt'ni, 

- 0, (I,. if a, > Xot;, 

so that a. = lu, + r„,. Then li„, ami r-,, eaii he rntioidt-red rns nli'^tn-vatioUH tin 
chanec variables y„ and Z „, nssiwelively, wheie 

r„ Y, - (), if Y Yiiij, 

•- (1, - Y, if Y ■> .V'li). 

Further le.l ~ l‘\Z„ ~ Y', enff'i ■ /vT”, E , V ' where f,’ - Y, 

Y„ , Zn and r is positive integral, if these moinenl,* e\i,-,|, jim • E J Y 
and finally, let F(.rl he the vdf of Y. 

In order to prove (i) consider the infinitely diniensimial sainple .space 11 willi the 
generic point« ~ u(ai, n-.., • ■ ■) nml let. E„ !<.» !, (n - 1,2,• ■ ■ h 

Then!?,, hiisprolmlnlily inea,Hir(!ji„. Wesliull .‘•how that X^*_i p„eonverge.s,Since 

= {J ^ Y" '*[/ '' -I mx) > X *' ‘ p„, 

we find 

I 1 

Vn S dt-H iijtti “ (i<|J I4S8//H44 U • 

ri n 

Now (4 + 8)/(4 + 5/2) > 1 and the iullnite. suin eonvergns. It follow.s that the 
set E of points which belong to inlimtely many sets En has probability aieasuro 0. 
Thus for every point tu e fl except those, in a .set of ineaMire 0 there exd.sls an 
index (depending on w) such that for n > 

(3 2) a„ < Y(/0. 

Further, since rh^ is finite and Y(7i.) —> oo, it follows that for these points there 
exists a .second index nl > such that in addition to (3,2) a„ < Y(n'„), {n = 
1, • • > Tiu). thus except on a set of measure b llie setpicricos are identical 
with the sequences S(„ for n > nl. This proves (i). 

In proving (li) let bn., (n, i” = 1,2, ■ • ■). We first note that under 

the assumptions of the theorem n ‘4. -> a,(Y) for r == 1, 2, 3, 4 except on a sot 
of measure 0. Thus except on a set of measure 0 

» " = 0^1). A, = 0(a), A* = 0(n), 



TKST CtF HAXnoMVKHS 


235 


and therefore by (lie arfrinucnt uwd in proving (i) again eveept on a set of 
moa‘'-nre 0 

5n n • o{lSt Hx) ■' /}„} ~ Bni — 0(a). 

It follows that in order tfi [irovc (iij it i.n .•JufKciont to .show tlial, 

(3.3) (^ - 5,0, ••■), 

except on a .■'Ot of inenMire 0. 

Now for r > 5 

< A(}',) < < N^'%{X), 

and tliorefore 

«.(!'„) 0(wV'^) - 

It follow.s that 

- lMr(Y„) - 

and 

var IKr n var - n(«:.(l'„) - ai:(K„)] = 

so (hat 

cKll^r) - 

AKsuine now that for .'■ome r > 5 (3.3) is not satisfied on a set /'V having 
measure «, > e > 0. W'e .shall .show that this assumption leads to a contradiction, 
and that therefore (3.3) is true. 

Choose e such that 

(3.4) 1/2 < e < (10 + r5)/(32 + 45). 

Since r > 5, (3.4) can always be satisfied. Then the infinite sum (l/n“') 
converges, and a positive constant d can be found m such a way that 



If we then write the Tchebyshoff inequality 

P{\ B„r - EB„r I > dn’a{B„r)] < l/dV‘, 

it is scon that except on a sot having at most measure p 
Now for r > 5 

(r + 5/2)/(4 + 5/2) <. f/i 

and by (3.4) 

e + (r + 5/4)/(4 + 5/2) = e + r/4 + (5/4 - r5/8)/(4 + 5/2) 

< r/4 + (16 + 25)/(32 + 45) = (r + 2)/4, 



ufH i}'i £<■ :i >, a?" 1. 


NO lliu! Jljf liwa-MUii’ oi )li< ]. . 

aj-Miisjutioit, Omn 5tr<n JI4', 1 Ik •>?! m 


ii!. »'•na.t'Jif !,■ our 


4, Consistency. To {iro-.i* >t«u>'v oi u ; l..<4 on iw'nmit.slitui- of 

fili,M('rvntii»iiH (jj, ■■ ■ , H, nil*' it'llo’.MUR jiroi»'«5i,ro loti 1*0 nstpljf'l, i^'i f!t»* 

htfltlSiil' he S., - .S'l/j . ‘ , JUifi l<y K'-iir , , <J,l fltlii V], - 

V''’Cf4 , • ■ ■ , tin* Yisin*' suiti vaanurf oi utvii'v «h»' .i* oimjtf uai that 

tlic Ml, of mii«litii\ vnriaMa- A“s, ■* , Y., to tSio Nn}nir(}nilatiMii 

(•oiiMhlinn (if till* (‘(juiilly lik*-ly (HTtimiatJniiN’ of lh»* iii-. i'n.-ifnni!,, As'iimi* 
that for tilt' nltmuduTM iiiuUt (•oiiriilt'r.Uion ksmn- value. ni ari* I’nlitM!. 
Tht'ii we ri'jt't't tlic ruiH liyimlhi'i-i.s Avln-never (>*„ i \' I „ 'vhe'ro/.- 
winu' ponifivt' roiistanl ili-pi'niliiijs mi tin* liimtiii^ 'h-lrilintittu of S„ uiuler tin* 
a,\snnn)lu)n of ('fpially likely iHTniutnlin'iiN ntoi tlo- level of Hii'mlieaiii'r. Thn.> 
m order to prove eoriMsteuey we have to eliow (ha! 

(■1.1) liiti ■ I"’' > I- 111 I. 

(•l.l) will be satist’ed if for some t > 0 


hm /’ 


’.sk ~ AT 


v^iY'' 


Tims we shall have proved eonsisteney, if we eaii show that when IIi is I rut', 
An/VnPa converRC's in prohiibility to (I and there exist.ssonie » > 0 Kiieli dial, 
hmn-^Pl.Sn/VnVT > < Hi] l. 

Applying this method lu our problem uutl noiiiig thnl n rorre.s|iondinK pio- 
cedure could have been used in the when small value.s of iSV art' erifieal, 
we obtain 

Pheorem 3: The (csl hoficd on Rk m consii^cnl v'ith rntpccL to nltnnnluxx for 
xohich 

(4.2) 0 


and there exists some e > 0 such that 


Uni P 


wherc and T Rh are given hy (2.1) otld (2.2), rcspeclivchj. 

In what follows it’will always bo assumed that tinder the jiltornative hypi)thc.si.s 
observations are independent from chance variables AT with continuous rdl'H 
’n{x),{n = 1, 2, ■ • •). We. shall often have the opportunity to make use of the 
act that the test is not changed if one and (he same constant is aubl.racted 
rom e'very observation. This will be helpful in reducing our problem to one, for 
which (4.2) is true. 

Let a. be the rank of the observation x^ on the chance variable A.. (i = 





237 


1 , • • • , n). 1 i 1 i~. ri«i rcsIrii'Httn to asHuma that those ranks take the special 
form 

-{n l)y2, -fa - 3)/2, , (71 - l)/ 2 , 

•SI) tlial .li (I, ,1: ~ 1)71 -• Si(?i^) and 

f td) I '’/i't, ~ j'lz 14 in' St(n') 

anti thorcforo (-1 2 ) is ahvitvif satishod. 

Bofoio wo onri fiiul onndilioiis under which (4.3) is satisfied, we have to in- 
ve.stigatfi flit‘oxiiootod vuhio ainl variants* of A’t, wlieu //) is true. For this purpose 
write a, - Z)" i?/io O' 1 , , n), 

ifa-. >a;,, 

(‘1 *.>) _ y,t s: 0. 

~ 1/2 if X, < X,, 

Then if P| A', < A'/l =- 77 ,■,, ( 1,7 = I, • • , n), we find 

^O/.j ='■ ^p.7 - j(l “ JK,) ~ p.j ~ i = e, 7 , (say). 

Further, 


n n n 


“ S E E y>7 Vi+f'.k) 

1-1 j~i k-i 


!/n+Iifc “ Vlk' 


(*1.0) 

Therefore 
(4.7) 

and 

Var ifji = A Z) Z^ U<i ll'+ktki/afiya+h.r ^ Zv ytjVi+hikE ^3 2/a/sl/a+)i, 
affy 


mi//.) = EEE 4- 0(n^ 

% ; * 


(4.8) 


i,k 


apy 


= E E U'^y.iy>+h.ky«»y«-kh,y - J^y,}y.+h,kEyapya+h.y). 

tj*. aPy 


In (4.8) the* expression in parentheses is 0 unless one of the Greek indices (in¬ 
cluding a; + h) equals one of the Roman indices. Therefore var (At. | = 0{n). 

It then follows from (4.4) that 

fO/v/nPA/. - Tt 12 lim -„V(Aft|Ax), 

Tt n -*» 


and we ean state tlio following corollary to Theorem 3: 

GouoIvLAiiy; When mnng ranks, ihe test based on Rp is consistent, if under the 
aliernalive hypothesis 


(4 9) 


Aeee etlS\+h,k — 1^(1), 


“1 7—1 k-^l 


ivhwe f„ = P{X, < Xy] - i 



m 


’VMS IJlKil 


Sincf* c„ “ , wo oaii svritc 

szi;-..-,.'..-..v^ -i-. f»rt. 

I j i *>),•« 

and the tot Is cansisfont if 

C4.10) litu \ I. X (t 

q 4 'ft U 


4.1. Downward (upward) IrnuL .Wumo tliul for i < j Htul all 
(4.11) u, < I) 

and 


(4.12) 


<a pjn • 


Those I'RtiniK'inonls ar(‘o(iniv!d(‘ii( to/*; .V, > .Y/, •' 1-2 and/'! .Y, •.Y),’ '■' 

I'iYj < Xicl and iiro satisliod if tho nhoiiialivi* to randoi inni,- n tlowiiwavd 

trend in the .sonso that /*',(/) < ( ' •' / . ■ r , j . ji, with at least 

one mtorviil of .strict inetin.ility. 

(4,11) and (4.12) ni(' not .‘'uHioiout tor to In* true, Tint'- !i->!.niiie in 

addition that, tlioro o.\-ist a |)o.‘vitivo inlef'or ii' atnl a iniinlM-r « •' 11 .Mich tliat 
1 n.b, n' Cij ' e then 


lirn i/. > lim ^ iA.i 

n-<« 7L SI'* I- , . j.1. , .1 


n-*«« U’ |{.^\ 


'yk-K- 




> 2d*hm iZ(k~h~ n')(n - fc-I- 

n’~‘oo 71'' k ’’1 


1 ) 




i) > 0. 


and the test is conai.stcnt, 

The case of an upward trend oaii be treated in o.’atctly the .sanu' wiiy. Tlio 
testis consistent with respect to aUernatives for whieli for i < j and all Uj > 0, 
> dj*, and g.l.h. j-.jjn- «■, =-■ e, where this time e > 0. 

Another test of randomness, the so-called J'-IchI., low been propo.sed liy 
Mann [6] with exactly this alternative of a downward (upward) tnmd in mimi. 
This T-test is also consistent provided certain general (Hinditioiia aro .salistied. 
Thus the question arise,s which of the two te.sl.s Khould lie ehoHcn if a downward 
(upward) trend is feared Thi.s question will bo consideied in Hection 7. 

4,2. Cyclical movemeni. Let the class of allcmalive.s be Sfioeified by 

‘lo+«.mo+fl = «a,/3, (a, ^ = 1, ■■•,(/> 1; 1, w -- 0, 1, • • ■), 

in other words, assume that the statistic Rh is used to test for raudomnesH while 
undei the alternative hypothesis there exists a regular cyclical movement witli a 
period of length g. It is sufficient to consider the case /i < e. 

If (4.13) IS true, 


(414) 


" n 

g + 0(n=) = n\ + 0(7i% 



TI. 4 ^T of lUN'noMS'I'.rtS 


239 


wIh'M' 
(i l.'>| 
and 



'} tl"! 


n 


I 


i 


f F It i « 


Tlm^i in \v‘\\ ‘if .1 '■* ■is'" I' i 3 I'-ti nt if t) y (1. 

It h rj, V I'dnt *-'. U< r< nto -<5 'imuc.- .iiirl o> tlicn-ftur* > I) if somn 7 ^ 0. 
HowcviT It !■’ |i''' it'!' tijii* ' "iiio "! ••vrii ;i!i („<i y ()> to / ttl, and .still to. — 0, 
If liajijH'Ui. tlif 1' M' ini'oit s Hi-ni, nlhmMM' it 1 m I'nn.-isttMil. If under Ih 
the pnjtnlntioii'* f!"!!i ivlin-h r-di'< ni)i\ c uli.'-crvatniiis arc drawn dilTcr only in 
IdcatiMii, flit' !dit'\d iiieii*iMii>d (‘'k'-ejiti.iKal cast* cantinl hapiifti, and lire to.st is 
alwav.'* i'(tn^i'’t»'n? i« -]»■*■! (.i (tu- of alternatives 

If Jt if is not diltindt to er(ii'*tnii’( an I'vaniple wlierc ««al ^a. I A,. 7»S 0 
wliilo 1 f ft, U here the?., are a permnfationof the numb™ !,• , g. 

'I’hns 111 this ease it i-iiot -ullieseii) that ■■nine «« / 0 for tlie te.st to be con.si.stent. 
(.’oiisisteney may al-o dejn-ial on llic older of the elements of n period. 

I\’e may eonelude that if </ is known, \u* .should always ehoose h ~ ff. I( g 
i.s not known, we may as well lake h 1. 

4..i. Cliatti/r Vi Imnlwu. Tnrmiitr now to tin* efuse when tlie test is performed 
on the ha,sis of the oti)r,iti!il oloervafions, it. will often be appropriate to assume 
that under the nifernntive hyfiolhesis the di.strihutioii remains the same except 
for a loeation [iniamefer. We .shall eonsider only the ease of a cyclical movement. 

Thus let 

/•'..(jr) ■»* F(x — mn) 2i '••)( 


where I‘'{x) i.s the. alf of a elrauee vuriuhle ff havinp; mean 0, and win rs a location 
parameter. It will al.'m lie assmned that U has the positive variance a and a 
linite fonrih imirneiil. 

In the eyclieal ertse with period g 

(i.l7) nOtf+a =• wior (a = 1, ■ • ■ , £7 > 1; f = 0i If ■ ■ •)• 

We .‘ihnll lintl eondilions under whicli our test is consistent with respect to 
aUerniitive.s of thi.s kind. (tliviou.slj'' we can assume that ^ (7^ 

ainee ollierwise we eould have, snhlraetcd i7l from every observation. Writing 
then n„ u„ |- Tn„ , (n 1,2, • ■ ■), where «„ can be considered as an observation 
on (In- previously cleliiied elianen variable we find 


Ai » ^ ii. ^ u, + 0(1), 


At = n'i + 2 S u,mi + ^ m! 


1-1 

Q 


£ + 2 2 ma Z) Wi«+“ + 

a-l 1—0 


^ 2 mi+ 0(1), 

LS’J “”3 



OOn'KrUKl) E. NOETJIEIl 




■whure Ji„ is the largest integer .such that, iidj -\r a < n and \n/g\ the. large.st 
integer < n/g. A<i and are given by .«iinilar evprei:.",inn.s. Since we assumed that 
Elf = 0, EU^ = > 0, and Elf < we hav(‘ with probability I 

52 Ux ~ o(n), S’d ■■= 52 ’4 " 0(n), 52 — 0(n), 

,„i 1-1 i-i 

so that with the same probability 

Ai = o(n), Ai = fi(n), A% = Obi), Aa = Obi)- 
It follows that with probability 1 

EPRh = o{n), V’^Rk J vi: = f2(n), 


and conditiobd4 21 of Theorem 3 is satisfied. 
Since further 


(4.18) 


n li 

var Ru = Yj var(.t, 2 :{+*) + 2 52 cov(xja;,+A, x.+aI.+m) 

fsael 1"1. 

= 52 f {f + 7rtl+A) — m‘, m‘,+h 1 

n 

+ 2 52 [m,ni,<,t)x{iT‘ + - m,m“+^7n^.l.2^} 

tCMl 

n 

= 52 I®’ + fi'fn] + + 2?n,m,+jA)) =» 0{n) 

1-1 


and therefore except on a set of probability measure 0 

p, p ^ R>. hm -- E{Kh 1 Hi) 

VnVm,, dn ~ 1 T? ^r~l' 2 ’ " ’ 

-^42 ff + - 2.^m« 

(I a=l 

condition (4.3) is satisfied provided hm 1 Hi) 0. Now E{Rh j Hi) 

- [Ji/ff] 52a-i + 0(1), SO that the test is consistent with respect to the 

class of alternatives (4.17) for which 

Q 

52 (m<. - fn){ma+h — m) ^ 0, 

a“l 

where in = g 52o-i ni„ , Thus by the .same argument as in the case of ranks, 
the test IS consistent whenever h = g, while it may or may not bo consistent 
if /i < g. 

6. Limiting distribution of R/, under Hi in case of ranks. For the remaining 
two sections, it is of importance to know conditions under which Rh based on 
lanliS is asymptotically normal under the alternative hypothesis. Using the 
methods of moments, it can be shown that in this case the distribution of 



TEbT OK RANDOMNESS 


241 


i^Rh EHhK'<r(Uf,) tfiiils to the iiormtil clLstributioii with mean 0 and variance 1 
provided var/i*)! - 

Generalizing tlu; method u.sed in Section 4 in evaluating the variance of , it is 
not diflieuU to see that PJ{R^ - =- 0{n.'''*=), (s = 0, 1, •)• It follows 

that ifvar A’a -- tlfn’), the odd iiiomenUs arc asymptotically zero. By means of a 
more, careful aiialy.si.s, it i.s also p().s.sihle to shov that E{Rh — 1?74)"“ ~ (2s - 1) 
(2s — 1^) ‘ ■ 3(var Rh)". 'I'his proves our statej*icnt. 

6. Ranks versus original observations. We have seen in Section 4 that if the 
alternative hypothe.sia is characlonzod hy a regular cyclical movement the test 
based on 74 is eoiibistent both for original ob-seivationa and for ranlrs, provided 
h = (j, where g is the length of a cycle. The quc.st.ion arises which test is more 
efTicient, the one ba-^ied on original ob.servations or the one based on ranks. 

In tiding to answer this question, we shall make use of a procedure due to 
Pitman*, wliicli allows ii.s to comiiarc two consistent tests of the hypothesis 
that some poinilation parameter 0 lias the value (P against the alternatives 
(I > (P using critical regions of size a, S,„ > E,n(oi), (f = 1, 2), where is a 
statistic having finite variance and /S,„(q:) is an appropriate constant. The 
relative ■eflicieney of the second test with respect to the first teat is defined as 
the ratio ni /712 where u; is the .sample size of the second test required to achieve 
the same power for a given alternative a,s is achieved by the first test using a 
sample of size Ui with respect to the same alternative. 

Let E(S,n I 0) -- ^m(O), var(AS,„ | g) = cr‘„(6), and \^»,(&°)/(r,■„(?“) = H,(n) 
Assuming that the, alternative is of the form 0„ = d" -h kls/n where fc is a 
po.sitive constant, I’it.man has shown that the asymptotic relative efficiency of the 
second test with respccA to the first test is given by lim pro¬ 

vided there exists a iiuinber e > 0 .such that for 6® < 0 -f e 

(6.1) exists; 

as 8n —> ^ with n —> » 

(6 2 ) 

and 

(6.3) 

(6.4) lim / ’ II,in) = c, , where c, is some positive constant; 

(6.5) the distribution of [S',,, — i/',„(^)]/o-,n(0) tends to the normal distribution 
with mean 0 and variance 1 unifoi'mly in 6. 

'* 1 bhould like, to thank Professor Pitman for his kind permission to quote from his 
lectui cs on non-parametric stati.slical inference which he delivered at Columbia University 
during the spring semester 1948. 


K,{e0) 

O’znC^n) ^ -j , 



242 


GO'n'FIlIED E, NOKTIIEU 


Condition(f).5) can be replaced by tho weaker condition 

(0,5') tbe distribvilion of - i/'.„(0„)l/cr,„((?„) tends to the normal distribu¬ 
tion with moan 0 and variance 1 as n '■J3. 

In our case, in order to insure con.ststeuc.y, it will bt! assumed that h = g. 
Consider the parameter 

(6.6) 0 = -f X. — «i)'i 

where as before is the expected value of th(‘ {Ih -t- Q:)th olrservatinn, (( = 
0, 1, ■ ■)■ We want to find the asymptotic iclalive efficiency of the test per¬ 

formed on ranlcs with respect to the te.st performi'd on original observations as 
8 with n —> CO. 

Again it is no restriction to assume that 

X ^ 

(6.7) m = T Z) = O' 

n a~l 


Assume further that the chance variable U defined in 4.3 has a finite absolute 
moment of order 4 -t- 5, a > 0, Then lil ~ \/nRh/Ai with probability 1 and, 
if the null hypothesis is true, it follows from Theorem 2 that with the same 
probability the statistic 

n 

*\/ ti y j ai, 

Q,= -- 

Eo:; 

1=1 


has in the population of permutations of the observed sample values an asymptot¬ 
ically normal distribution with mean 0 and variance 1. This, however, is also 
the limiting distribution of Q/, under random sampling when the null hypothesis 
IS true, as follows from the results of Hoeffdmg and Robbins [3]. Thus it will be 
sufficient to find the asymptotic relative efficiency of the E^-test for ranks with 
respect to the Q^-test. In doing this, it will also be assumed that U has a con¬ 
tinuous density function f(x) = F'(x), and, in order to simplify notation, that 
there are nh observations instead of n. 

In finding Ha{nh), let Xa,, = Xa, = and u„,, = Uai = ii(,_i)i,+n , 

{a = 1, ■ ■ ■ ,h,j = 1, ■ • • ,n) Then 



^ E E ^ Z E (Wa; + 

nri a-l ,-l nil a-l ;-l 




4“ 2?n„ E '“aj + nrn'a > —>■ o-^ 4- 0 

;=l j prl 


^ “H 2??la ^ y^aj "i" TiUla 


Further, 



TFhT OF nAXnrtMNEfis 


243 


80 that 


Kih 


-,^7 li' 
,, \ nh 


0 


i'nJO). 


Therefore 
Also by (4.IS) 


\''nh 


ia 1- 0)-* 


vrir (},, 


A 

nlw* -f- l/Kr" 


„.,i ^ ® 

nh(ff’~ fy-+ OP 
which cnnv(‘rjj;eH (ii 1 an 0 » 0. ff follows llial 


( 0 .«) 


riu^nh) - pU(i)) - 

<r 


Conditions (I) 1 ) (li.'jl arc canily seen to be saHslunl. 

ConHiderinn now the /ih-lc.st for ranks, we know tlmt (7ik)''^'^R/, has finite 
variance. Kiotn (1.7) anti (I 11)-(4.1(1) it i,s found that 

(0,9) J'.'lCii/ij " !{> Oj S fS m') ='f'KniO) 

h' n-l V -1 / 

and after soiiu* eonipulaf am.s 


( 0 . 10 ) fnJi)) 

From (4..1) and (O.U)) 

lltAiili) -- l2'\/rih 


[I/ 


Conditions ((i.l) -(U.4) and (0.5') can be shown to be satisfied. 

Thus the Jusyinptotic relative efficiency of the test based on ranks with respect 
to the test based on ori{final observations is 


( 6 . 11 ) 


144 a/i 


lino = 


“ nW "14 

/ /-(x) dx r .. 

L>« -J = 144 

nh/cr^ L 


J‘{x) dx 


-|4 


As is not difficult to set', this expression is independent of location and scale. 
Let the chance variable U have density function 

'O, X < -1, X >1, 

1 + X 


m = 


1 + o’ 
1 — a: 


—1 < X :< 0) 
0 < a: < 1, 


^ -1 < a < 1, 



fiOTTFRiKi) 1 .. N<)i;nn;ii 


2U 


i.e., leb thti grapH of /(■!•) l)t‘ hy tlu-1 wit .straighi lincf. connectuig the poiuta 
(—1, 0) and (1, 0) with tho point (a, 1). I’Iuti AT-' --- a/.'t, var U — 1 }^ (3 + a^), 


[ f(x)dx = '2/3, and ((3.11) hncoitws [S(3 f- a*)/27r". Thu.s J/rq increases 

■with I a j. For a - 0, it Is (’qual to (It,'HI; for j a | - I, it i.s equal to (32/27)^ 
It is equal to 1, for a = Vh/H. 

Tills example shows that the asyiiiptotic relative etlieiene-y of the. rank tost 
with respect to the test based on oiiKinal ob.servation.s may he <1, —1, or >1, 
depending on the density function/(r). Unless/fx) is eviilieitly given, no state¬ 
ment can be made as to which of the (wo tests is to be preferred. 

We are now in a position to give at least a partial an.swer to a question raised in 
[1]. In concluding their paper, Wald and Wolfowitz note that the problem dealt 
with in this section can be po-sed not only when transforming to ranks, but also 
for any transformation earned out by means of a continuons and .strictly mono¬ 
tonic function h(x). 

Let i = h(x) be such a transformation, satisfying in addition the condition that 
Pitman’s procedure remains applicable for the transformed distribution. Corre¬ 
sponding to and Q we shall use <r( and Q,. Let h{ma) ~ Ha , /r‘ (^^a — iif 

= Then if EQt - by (0.8), (G.9), and (0.10) 


( 6 . 12 ) 


de 




#Q,n dd- dr; 
d& dr; ^ »~o 


ViTh 


1 

f r” . 

1- 

H 

8 

1_ 

dj 

L-'-® J 


= IIM, 


where ^(t) is the inverse of h{x). Therefore by (0.8) and (6.12) 


< 

"J 

r fix) dx] 

-00 j 

f 


r 

'“00 

flg{tW\t) dtX 


and the asymptotic relative efficiency docs not merely depend on h(x), the 
operator defining the transformation, but also very essentially on the underlying 
distribution/(a:). 


7. Comparison of the R;,- and T-tests. The T-test by Mann [0] designed to 
test for randomness against a downward trend is based on the statistic 

n 

^ ^ 2] (2/.; + i) = 2 22 2/.J + in(n — 1), 

wheie is defined by (4 5). Making the same assumptions as in 41, Mann 
shows that under the null hypothesis T has a limiting normal distribution with 



TI-^hT (IF RAN'DOMN'KSS 


245 


mean in(R - 1) and varmnco ^'^( 271 ^ -f 3?r -- Fni), while under the alternative 
hypothesis 

(7 1) m' - ln(» - l)(2f„ + 1), 

where fn ia deliiied liy ^/((n — l)f„ 

Let 

i - > 0 ^ - 1)]. 


When Ilo is tnie, S„ is asymptotically normal with mean 0 and variance 1. If 

1 f" j 

we then put =-= ';^j^ e da:, a critical region for testing Ha is given by 

Sn < —X, wh(*re X is detennined in such a way that i#>(X) = a, the level of 
significance. 

When Hi is true, we find from (7.1) 

L'(''5'n 1 ft.) ~ 3-s/n f„. 

By paralleling the jiroof of asymptotic normality of Ri, under Hi given in Section 
5, it can be shown that (*'„ ~ ESn)/cr{ii„) is asymptotically normal with mean 0 
and variance 1 provided cr(.S'„) = 0(1). This is essentially the result obtained 
already by HoefTding [7]. Thus the asymptotic power of the test based on S„ 
is given by 

(7.2) < -X} - ^ ~) 


converging to 1, provided lim„_„ V» fn = — “ • This is thecondition for consist¬ 
ency given by Mann, 

We may ask for the asymptotic power of the Sn-test as f„ —> 0 with n —»■ oo. 
More exactly, instead of considering a certain alternative = A,,, where the 
hi, are given constants, consider the alternative (changing with n) 


(7 3) 


_ 


If then as n —> CO 


and 


.w E 2: h 


n{n - 1) , 7>i 


k 


<riS„) ^ 1, 

it follows from (7.2) that the asymptotic power of the .Sf„-test, and therefore of 
the T-test, for alternatives (7.3) is equal to 

<j>(\ 3^;). 



:t *. Mii.itif i; 




Xmv Ils*’ n (li*' f31 h\ is instead of T. 

We kmw tliiit ’sUa 11 H )'»injt* 

I 

7(* '*■ It 

tin r “ V J 


uliw lit, is (livfn l»y i l.t] •. is ;* vinji^Mti.- sHv iwth tticaii (I tind variaiico 1. 

TlniKinflii.Hnwdherritii'iil regniK r-ttiiii'ii I'vA'!, * 
we lind 

and aKyiniilntieiilly the iKmcr nf th*^ h*\ ii •( is 


(7.1) 






"C 




inovuU'd (rf/iil) ■ n(l)-Thus llu-fi-l is iflmin., v^d., ^.ITow- 

(iv(‘r, fur llio aUcriKilivc (7.(1 lend^ <■* t,sf.\ i jimvidi'd that, as n —*■ w 


rril:\) > I 


'I'lius tli(‘ /i’/. lest is iiii'lTii'livu willi if'-jti'i'i Ut (hi' allciuaiivi' (7.d) in cunirasl 
lu the 'i'-ti'.sl. 'I’liLs iiiuaiis tli:it fur lid' .'dlfUiMiiii' flu- :i‘'ynililuti(' iclalivi' 
(‘liii'imoy of the /iVti'sl with rc-iiirl In the '/'-(I'st is I). 


AcknowliHlumcnl. 'Ihe aulliur wi-lics In arKiinwh'di^c llit< valualih' lu*lp of 
Professor J Wolfowitz who Mijiiiesled iho |n|iif and imdei whose direeluin Iho 
work was coinpleled. 


ui‘;I'1-;hi;n('i.s 

fl] A.Wm.h ASM) J ^\ol,lowTr/, “Aiiexafl leil, fur ratiiloinni'ss in Itie miti-iiar.'itiii'lrii', caHO, 
baswl oil Hc'iial forrchitimi,” Aiimih <»/ Moth Stnl , Vnl H (l',)i;i), pp liTS-USS, 
[2] II CiiwMi, l\IiUhnimlicttl Mtihoils iij AtnttHliiv,, I'riin'i'tnii I'mv I’risss, Priiu'iitnii, KllO 
13) W IToefI'DIno a\i) II IloniuS's, “Tlie crnlial liinil IIh'iih'iii fur ilciii'inli'nt ritiuloin 
vanalik's,” Duke .Ualh. ,/„ Vol. In (PIIM, pp 7711 TM) 

[4] G, E NuiiTiinn, "On ii Ihoorcm liv Wal.l anil ^Ynl^mvila,■' ln,-ni/;i of Miilh. eitui, Vol 2(1 

(194y), pp. 455-453, 

[5] J V. Uspensky, IntruducUon lo Mnlhemaliral I'lithulnlibi, Mptliiiw-IIiIl, Now A'ork, 

1937, 

10] II. B Mann, '‘NoniiitiaiuGli'K; IchIh iiKaiuMl troinl,'' Urtmomdrim, Vol. 13 (1!I'I5), pp. 
245-269. 

[7] W HoEFFniNo, "A clasH of sLutiHUca with aHyiiipl,oLi(‘n.lly minual (liNLiiliul.ion«,” 
Annots of Math, StaL, Vol 19 (1943), jip. 203-325. 



THE DISTRIBUTION OF THE NUMBER OF EXCEEDANCES' 

By E. J. and II. von Soiielling 

Nm York and Nnvtd Mtiliml Hr.'trarnh Lahoralory, New London, Connecticut 

0. The problem. AVc sludy the* prcibahilit.y that the mth observation in a 
.sample of .'^ize n taloai fiuui an unknown (li.slrihutioii of ii enutinuous variate 
will he exeia'iled .r time', in .V I’nfnre trials, and ealeulate the averages, the 
monieids, and the euniulalive prohahility fnnetinn of Ifie nnmhcn of oxecedancGs. 
This problem h'ads to the liyiiergi'ometrie .series. Our starting point is a .special 
ease of a distriliufion .sfiidied by Wilk.s [.3] who eon.sidered several order statistics 
whereas we eomstder only one Ilis toleranee limits are special cases of our 
cumnlatu'f* probability function. TIuus the pre.scnt paper is, at the same time, a 
specialization and a generalization of the work done by Wilks 

1. Distribution. From a continuous variate f an alternative is constructed 
by clioo.sing tin* wlh among v ob.servalions ^m(wi = 1, 2, • ■ ,n). The rank m 
is counted from the toj), uhieh means that m = 1 (m = ii) stands for the largest 
(sraalle.st) obsen-ation. 'I'lie observation i.s thus the mth largest value. We 
ask; In how many eases ,i will the past mill observation bo equalled or exceeded 
in N fiitiue. tiiuls takim from the same population? For the sake of simplicity, 
a: is calk'd (he iminher of e.xeeedanci's. 

If the initial iirobability f (fm) = Fm for a value less than is Imown, the 
alternat.ivc jirohability for exceeding ?», is 1 — Fm, and Bernoulli’s theorem gives 
the probability 

(1.1) vk(F„ , N, x) = (1 - F^TK'^ 

that .r among N future trials will exceed • However, as a rule the probability 
F„, is unknown. T'he only data known arc the n past observations. To eliminate 
the prohahility F„ , we introduce the distribution v{Fm) of the frequency F,„ 
of the mill largest among 7i values 

(1.2) a(n, m, FJ dF. = mFr”(l - F„,r^^ dF,^ , 

consider Fm a.s a \'arialc, and integrate (1.1) over all values of thi.s variate Thus 
Fm is replaced by a function of n and m. 

'riio convolution of (1.1) and (1.2) leads to the distribution win, m, N, x) of 

‘ Opimona or concluaions contained in this paper are those of the authors They are 
not to be construed as necessarily rofleoting the views or endorsement of the Navy De¬ 
partment. 


247 



fiiUCT tna1(» 


' (iii'j*!vfUititiff in N 


{L:iJ 


’/ V, I- , V, , .1 



I’his prnlaliiljty upitu f^i- par sjm' f'r,' i*, "j. .iici hnf nnt updii the 

uuknnwn jirnlwinlity . ’rhur'Surd rt s'- ‘h tni-ufifii fire. If wr am iiiU’rf'.sl«l 

iiUlu'dcpi'iulr'iiw «if ifift, w \\j I v'.< ‘;n.ph' wtpi* ?/■>/ 'I’lic t’nnililionH 

for lilt* jiOiiitivf* jiiU'pT- m .iriil .r, nrid f^r Ui'- [iruhaiahly an* 

(1.3') 1 m H\ U *■' <■ .Y; ^Lff* 

I 

Till* (lis«triiiu(i(in (1 3| pas*../',"*’,-, flu- fitllowiiig 
(l.‘i) irfn, .V, Ji - 7CU). ?( m -t 1, A*. A' 


which roiiflK; Thr pTdtnhilUn //(at th>' pu'-l m(h uFhi Jnm nhwf' ti'ill hr rxcrrdcd 
X iimrs in N nnv Iricth in (qunl in thr iinihthilHy tknl !hf imt mlh I'dlur from hrloto 
idll be exeecdcfl A’ - .r Hmrit. 

The nN prohahilitics «’(’«, m. A', j*t an* litikcri hy K'vcral itTiirrrnfa' fonnnlas 
which follow ciaaily from the uaual coinhinutdrial nile.s, For fixuti m, lh(* proliahil- 
ity for X + 1 i« obtained from the prolmhility for x by 


(1.5) 


win, in, .V, X + 1) = ie(«, m, N, x) 


{.V + n m -- x)(.r + 1) 

win, n " m -f- 1, N, iV — x). 


In the fame way, the inububilities «•(«, m, A', x T 1), »•(«., m + 1, A’', x) and 
win, .c, N, m) are eaeily obtained from the probabililieH win, rn, IV, x). The dis¬ 
tribution (1.3) luiB many aspects .since, besides the number of c.xeeedanees x, 
also the rank m and the number of future trials N may be considered as variates. 

hor m — 1 and m = n, the distribution of the number of exrcedance.s over the 
largest value diminishes with z, and the (.listribution of the number of exceedances 
over the smallest value increases willi x. For x — 0, and m — 1, we obtain from 
( 13 ) 

(1'^) win, l,N,Q) = ’b 

For X = 0, m = n, the probability that the smallest ob.nervatioii will never be 
exceeded, equal to the probability that the large,st value will always bo ex¬ 
ceeded, is voiy small, even for moderate sample sizes. 

If n is odd, then m = (n -f l)/2 corresponds to the median of the initial vari¬ 
able t, and the symmetry relation (1.4) becomes 

(1-7) win, in + l)/2, JV, x) = w{n, (n + l)/2, N, N - x). 



DISTIinifTIfiN OF EXCEEDANCES- 


249 


It is cqiuilly jjrubiiliUi that the median of the n past obsei-vations is surpassed 
.r, 01 N ~ ./■ times in .V futim; trials. 


2. The two asymptotic distributions. If fiolh n and N are large, m may increase 
with n .Mich that flu: (inotienl in/n roraain.s eonstant, and the mth values remain 
near the median. Or, m remain.s constant such that m « n, and the mth values 
are ('.xtreim‘.s, 

In the first ease, let n - N 2k ~ 1, where k is large. Then m - k is the 
rank of the median of the initial di.stributiou As shown in (1.7), the distribution 
of the mmiher of e.KCoedanee.s over the initial median is symmetrical. To obtain 
the asyrnfitotie. distribution we reduce x by writing 


(2 1 ) 


a: = A: 4- z-\/k 


whore 2 remains in a finite interval The same reduction may be applied to mth 
value,s in the neighborhood of the initial median. The distribution of the number 
of execedane.es over the. initial median is, from (1.3) and (2.1), 


w{2k - I, k, 2k ~ 1, j:) == const 


/ 2fc - 1 \ 

\k + zVA/ 

( 4ifc-3 Y 
\2k + zVk ~ 1/ 


Consider only the factors involving the variate 2, then the right side becomes, by 
Stirling’s formula, 

(2fc + g-s/fc — 1)!(2A: — z-\/k — 2)1 
(k + 2 \/A)!(fc — zs/k — 1)! 

(2k + (2fc - 

(h + (A: - e-'vT+»Vi 




Combination of the factors with the same powers leads to 

(4fc^ - kzT ( (2k + zVk) (k - zV^ \^ 

(k- - /ca^)'’ \(2k ~ zVi^ (k + zVk)J 

0 ' 0 * V ' 

Since k and y/k are large, and 2 is small, all factors lead to exponential functions 
whence 




exp 


lim w(2k - 1, k, 2k -l,x) = const e 

jfc—« 


and finally, 
( 2 . 2 ) 



250 


K. J. miMHKL AN'D II. VON' .SCIIKIiLIN'G 


Tho numhcr of rxrcrdanm ov(r llw initial median, m ~ k, in a laigr. mmple of 
size 2k — 1 in 21 1 fidnre liinlf! i.s narmalln dixlidmled with mean, median, 

mode, and vaiianer equal to k Thcn-forc the in-nlialiililips (2 2) may he called 
the dislribuiion of normal nreedances. 

In the aecond cuhc where and n are large, and m and ;r are small, a distribu¬ 
tion analoKOU.s to the Poiason distribution vill he ohtaiui’d. To indicate that 
N and n are large, tlu'y are writtc'n iV 'uul n. The prohahilily 


iw(?i, m, t) 


(.r + m + 'i - ■' - »i)'- 

(m — 1)' .r' ill — m )! — .r)' (^ + la)' 


obtained from (1 3) becomes, by use of the Stirling formula, 

/ AT 'i —* 

(2.3) , j(S + „r- 

= io(n, n — m 1, ^ — x). 

lin = ^, the preceding formula hecome.s 


(2.4) 


w(n, m, 'll., x) = ^ = w{n, « — + 1, tj., n — x) 


This probability that the rath largest (or smallest) valui* will be exceeded x times 
(or n — a: times) in n future trials is indepeiulent of n. Since m is small compared 
to n, the probabilities (2.4) may he called the dislnhulion of rare exceedances. 
For a: = 0, we obtain the probability 


w(n, m, n, 0) = (J)"' = w{n, n — m + 1, n, n) 

that the largest (or smallest) mth extreme value is never (or always) exceeded. 
For m = 1, and n = ^, the probability 

(2 5) w(n, 1, n, x) = (J)’"*'^ = w(n, n,n,n — x) 


that the largest (or smallest) value is exceeded x times (or n — x times) is a 
geometric senes. 

To obtain the moments of the distribution of rare exceedances (2.4) we con¬ 
struct its generating function 


GS) = 


(«’ 


, /a; -h m — 

UX m - 1 



From the well loiown expression for the negative binomial follows 


(2 6) G,(i) = (i)- 

whence, by the usual procedure 

(2 7) X = m 

The mean number of exceedances over the mth value from above in the dis- 



(xj tn Atz/iqocfOJc^ 


DIRTOIHUTION OP KXCBBDANCES 


251 


tnbution of ran' fxcPMlanrf'H in m itnclf. The second derivative of (2.6) for t = 0 
lead.s to the v ariance 

( 2 . 8 ) = 2 ?/! 

which IS the doviliU*. ftf the variance in the Poisson diBtrihntion. This difference 
IS easily exjilained: If we apply the Poiasoii law to the o.vc(!cdance.s, we have to 
know the mean nninher of e.xcH'danees. In our case we only know one observed 
number of c.xccfalancc.s. Consequently the variance must bo larger than in the 
Poi.ssoii casi'. 


C<K\eii 1 



'Idle viiriatice for the distribution (2 2) of the normal exceedances was 
(iV + l)/2, n-hereas the viirianco (2.8) for the distribution of rare exceedances, 
2m, is imu'b smaller since w is small compared to N. This interesting relation 
will be generalised in paragraph 3. 

For m incri'asing, the distributions (2.4) spread as shown m graph 1. The dis¬ 
tributions have two modes 

(2.9) = m — 2, *2 = w - 1 

except for m = 1, where the probability dimmishes with x. The distributions 
(2 4) are similar to the Poisson distribution for integer m. However, for this 
distribution the modes are m — I and m 





252 


K. S. CUJMDKr. AS-IJ U. VON HflLKLUNG 


The similarity between the two tli.stributions may also be seen from their 
behavior for large m. In this ease, the roisson distribution for the standardized 
variate y = (i’ — m)/cr converges toward ii normal distribution. The same 
holds for the distrilmtion of rare excee,daneo.s. For tlie proof consider the .standard¬ 
ized variate 

(2.10) y = (a: — m)/-\/2m. 

Its moment generating function fr„(t) liceomo.s, from (2.()), 

Gy{i) = 


The usual development leads to the second memlier 



If we neglect the factors 0(m we finally obtain 
(2.11) G,it) = a'*'' 

which IS the normal generating function. Thus the distribution of rare exceed¬ 
ances converges toward normalcy in the same way as the Poisson distribution. 


3. Moments. We return to the general distribution (1.3). For the calculation 
of the moments, the hypergeometric series F(a, /3, 7, 1) defined by 


(3.1) 


F(«, 0,y,D = + 

1 7 


«(«+1) m + I) 

1-2 7(7 -1- 1) 


+ •• 


IS used. The x 1st member of this series is 


C 3 fCo.'i = ■ (a + x - 1) /3(l3 H- 1) ■ ■ - (13 -f a: - 1) 

x! t(t+ 1)-- (7 + ai-l)' 

On the other hand, the x 1st member of the distribution w{x) may be written, 
from (1.3), after changing the signs, 


(3 3) 


w{x) = 



m(m -t- 1) • ■ • (m + a; - 1) 
x! 


_ {-NX-N +1) ■■ {-N + x- 1) _ 

{m ~ n - N){m ~ n — N + 1) ■■■{m— n — N + x— 1)' 

1 his IS the general member (3 2) of the hypergeometric series, if we write 

(3 I) ot = 711, jS = —N ; 7 = m — n - AT 



DISTIUHUTION (JF KXCEEDA,NOES 


253 


Therefore the prohahility v,'(h, m, N, x) is the x + 1st member in the development 
of 


h'(N -\- n ~ ni)! 


F{vi, —N, m — n — N, 1). 


(N + — w)! 

yitK'o the sum of the prohaltilibies must be unity, we obtain 

(3.5) Fbn, -N, in - n - N, l) = r A" V r 

n\ [N ■+ n — m)\ 

This relation will be u.'jed for the ealculation of the factorial moments Xik] of 
order k which arc, from (3.3.), 




n\{N + n — ?«)! ^ 


(3.5) 


(n — m)\(N + «)! 

N(N — 1) • (// — X + Oiniin + 1) • ■ • (m + x — 1) 


{x — k) \{N -j- a — m){N + n — m — 1) • • • (Af + a — m — x + 1)" 
The llrst member in tlie .Mirn is 


(3,7) ^(1) = 


iV(iV 


()i(A^ + n 
The second member is 


l) (AT -- fc + 1 hijm 4- l) • • ■ (w + fe + 1) 

- >n)(N n ~ m — 1) • • • (N + n — m — k + 1) 


<p{2) - v^d) V 


(N - k){ni + k) 


1 !(Ar + n — m — /c)' 


Generally, each suecessivo member is obtained from the preceding one by the 
same rules as the suceessivo members of the hypergeometric series (3.1) Con¬ 
sequently, from (3.0), 


(3 8) X[fc] 


n!(A^ + n — ?a)! (i)(i i (Af — k){m + k) 

OT- m)KN + n]l V ll(iV + a - m - fc) 


The sum in the brackets is the hypergeometric series 


F(m -b k, — {N — k), (m — n ~ N + k), 1). 

If \vc replace, in (3.5), m by ai -f- &, Af by AT — k, a by a + A,we obtain for the 
sum in (3.8) 


F{m + k, — {N — k),m — n ~ N k,l) 

(3,9) __ (AI a)!(a — ?a)! 

~ (n + k)\{N + n ~ m — h)[' 


Introduction of (3 9) and (3.7) into (3.8) leads to the factorial moments 


(3.10) 


m(m + 1) ■ 
xw = - 


(m + /c - 1)AI(A^ - 1) ■ (AI - fc + 1) 

(a -|- l)(a + 2) ■ • • (a -b h) 



151 


K. J. GT’MHKIi AND U, VOK R(«tIEr,I-I.\'(! 


Hid to the recurrent n^latiou 


(«i 4-L - 1)(,Y - t + 1) - 
- n+lc ' 


IE n and IV are botli of the wime onler of maKtiitude, and larne ciirapared to k, 
the expression (3.10) simplilies to 

(3.10") ==: m(m 4- 1) (wi i- k - 1). 


GnArtt 2 

/lyez-a^es of numbars of exceedances. 



For Jc - 1 we obtain the mean number of exceedances over the mth largest 
value in TV future trials 


(311) 




= N 


m 

71 4* i' 


This expression is identical with the classical formula x = IV{1 — /'’,„) in the 
Bernoulli distribution (1.1), since the mean of 1 - obtained from (1.2) is 
W(7i 4" 1) In both distributions the means need not be integers. The mean 
number of exceedances over the smallest value is n times the mean number of 
exceedances over the largest value. If fF = 4~ 1) we have = m, and the same 

holds if n and N are large. If n is odd, and m = (n 4- l)/2, the mean number of 
exceedances over the median of n observations is Y/2. The means are traced 
agamst m in Graph 2 forn = JV = 9, and ti = fV = 10. 


fiy^ra^es o/ /Kffnber ef exc€edances 




DlKTlUmmON OF EXf’BKDANOKS 


255 


Thf' mPiui niiinlir'i- „.c of (’.vfcprhuiccs ovor tho value from below is related 
to x„, by 


(3.12) .fr, + „:f -- .V. 

Tlio variaiu'O.s a'„ ami na"’ of llii* mnnlier of excieedaiUK's over the ?ath values 
from al)o\e ami below Itceome, from (2.10), 

j (l i. b " 1) _ mN \ 

H -b 1 \ M + 2 n + 1/ ■ 


'The ehoiee fif a fonunoii (leruiininalor leads, after trivial calculations, to 


(3 13) 


a 111 A {'ll — la l}{N a 1) 

Til +1)='(M^2) 


The variant',e.s inereiise with A' ami ilimini.sh .stroiif!:ly with increasing n The 
im inner in miiMinnin for ai — {n + ])/2, i.e. for the median dbscrmiion where 
it heeome.s 


(3.130 


. {N d- n + 1) 

T (71 + 2) '■ 


The. vurianee.s of (he nmiilier of evceedauco.s over the largest and the smallest 
value arc 


(3.13'0 


2 nN{N + n + 1) 
(a + 1)H» + 2) 


The ((uotient of the variiiiiee.s of the median and of the extremes is 


(3 11) 


e(n+I 3/2 


(?i + 0’ 

4ii 


‘r(n+l)/2 

icr“ 


Coii.sequeiitly (he variaiiee of (he median is about n/4 times larger than the 
vaiiancse of the extremes. In other words, the extremes ar6 more reliable than the 
median, and (his (lUuliLy inereascs with the sample size. This is a generalization 
of the, relation obtained in paragraph 2. Such a behavior seems singular. How¬ 
ever, it also holds for the uniform distribution, and for the distribution (1 2) 
of the frequencies [1]. 

In Bernoulli’s ease, the variance a], is, after replacing 1 — Fmhy m/{n + 1), 

s m (ri — 771 + l) 

"" " ^ (71 + 1) (n + 1) ’ 


whence,, from (3.13), 


2 


2 


JV + 71+ 1 
71 + 2 


> 


2 

CTij. 


The variance in our case is larger than in BernouUTs case, since we do not assume 
the knowledge of the probability which is required for the Bernoulli distribu- 



-j i iff u f ^ 


• %3# y 


'n ii”* * Hf n ' 




ti<m I'Vfr.V j- i M, fi^‘ ‘ '•' •'■•' f ‘' ‘ 

«!jfilrilmli<<n. Thi*' i- a ,v !(• •'"‘li" ”! 


if till' Hi*niinilli 


i, Th« mode ami tJie mf lian. A-n Jmi (Si- sh*- < it: dusltl'’ mtiulMT i of 

fXlptOTiaflOfrt IrtTr thf ItP'V I'HI;* 1 * .'(IIIMllK if (()■-tTV.itltill id A fufllH' 

Iriftlit. ii im‘.l mU-^-r ..Jo- -i! iril..itittn »' ■/» 

{or iiu'rrawst w*)li .f fur ?».’ \ o»r'M '• «<•-loh (..tccii" 

(1.1) ' m ’■ h 1 

Tlif* rawle is oMaiiiwl (mih I)i'‘ im'ftii.tlilio' 

(l.‘i) uin, m, X, / - U < u'i», m, Y. r* ■ o i o. »■. X, .<■ 1 '> 

wliieh lead, from (l.-’i) to 

(1.3) (m - '^'y - I ^ X '■ (m IY 

Tim length of the inU'rvul ia imily, uk for the Ih-iiionili •li.'-liihntion. 

There are several caaea where turn nKMU*.s cvi.sl. 

a) Let the uuralmr of futurt! IriiUa .V l«* wieh tluil 

(4,1) ~ /;(« • 1) - - 1 

where is a positive integer. Then the jikhIc.h uh', from fl 'l) 

(4.5) i(0 “• h‘{in — I) — l;x(« " /-■('« ‘ 

b) The modes (4,5) also hold if n anil N are largo coinpared to unity, and if 
N = k'n, where k' is again an integer. 

c) If n is odd, the median of the initial variate Inns the rank m - {n d- l),''2. 
If, at the same time, N is odd, there arc two modes, namely 

(4.6) x^o - (iV - l)/2; *«, - (A’+ l)/2 

In the case N = n, the two modes X(i) ~ m — 1, and X{a) = w differ by unity 
from the modes valid ia the twm previous cases. 

In the case n = N, and m (n + l)/2, only one mode exists. To find its 
location, consider first the case that n ~ N i.s even, and ^ h/ 2. I hen tlie 
upper limit in (4.3) is 

[m - 1 ] d- - An - 0 ^ [m - 1] d- 1 - * - < 

n — 1 n — 1 

Piuce the interval has unit length, the mode is x — ?>( — 1. If m > (n -h l)/2, 
the lower limit is 

[m - 2 ] d- -- (m - 1 ) > [m - 1 ]. 

r ih X 

The case that n = N is odd is treated in the same way, and leads to the follow- 



Di.-^iunirifd.v op KXfEKrnNTiOh 


257 


iiiK rc'ulf: "Ilu* Jnn.^f numl)(*r.s of ovpocdaiu'o.s over the ??ith value m 

.V --- n fiitnie frialh art' 


(1.7) 


/ - m ~ \ I'll' )!> f nJ'Z\ X — 7n fnr m > (/i/2) -|- 1, 

if n = N is even, 

J- /// I fur m < (n + 0/2; x - m for m ^ (m + 0/2, 

if 7i = is odd. 


We now coii.sidfr the median If the prohabihties w{x) are summed up from 
.1 _ 0 (inward, there may e.vist an integer Xm such that the probability for at 
mo.sl c„. eweedauce.-) is i- This i.s the median numlier of e.vceedances. Such a 
mimbc'r neetl not e.vi.st, A\.s.sunic, for example, N < 7t, then the probability 
ni(/i, I, Ah 0) alone f.seo (l.li)) .snr{iarse.s .1, and the number of exceedances over 
the hugest ami tlu' smalle.sL value do not po.ssp.ss a median. If the median £„ 
e\i,s(s, it follow.M from the .symmetry (l.-t) that A" — ;i;,„ — 1 is the median of the 
number of exceedances over the ?«th value from beloiv. The relation 

(4.8) d" mir ^ Af * 1 

diders from the corre.sporicling relation (3.12) for the mean. In some special cases, 
the median can be obtained iimucdiately. Fora: == 0,7/1 = 1, ?i = N', formula (1.6) 
leads t,n 

U)(n, 1, n, ()) = § = 7f/(7i, 71,71, a). 

The prohahilitij IhaL the larfj('f<t (or smallest) of 11 past observations will never {or 
always) hr. (\wrdcd in n future trials is equal to ]. If n and N are odd, and ni = 
(a + l)/2, the summation of equation (1.7) yields, with the help of (1.3'), 

^ w(2) = £ ^(s) = 1 — X) 

0 //—I s+l 

Now the. inetlian number of exceedances x is such that the tivo sums on the right 
bides are caiual to Consequently the median number of exceedances in this 
case IS i/i ~ 1 
We claim that 


(4.0) = 7/7 - 1 

for all m, provided that n = W. For the proof, consider the probability 
W (n, m, iV, x) that the mth largest value is exceeded at most x times in iV fu¬ 
ture trials. This is the sum of the first .r + 1 members iu{x). Let F^fa, (3, y, 1) 
be the sum of the first v members of the hypergeometne senes (3 1) Ihen the 
siibstitntion.s (3.4) and v = x f- 1 lead to 



Fx+i(?k, —N, m — n 


- N, 1). 


(4.10) IF (77, 771, N, x) = 



K. J, (ifMUlX Wit H. VON hOni.LI.lNf; 


For tlio sums tif the liypi'rpoimuirit' f-orio" F it, J, y, 1 * llii‘ l"ollo'i\ inij; recurrence 
formula [2] is useil, 

{y - aHy ~ ‘i - n 'v l) - iy - - 1) , . 

—7,1) 

V _ ,iU h \) ■■■(,I F I) - t) 

<'( “ filfy ~ n I ]) • ■ ■ (y If “F e - 1) 

F,r(i', y “ J •- It, y -- fv F I', 0. 

The Kuhstit.utioiiK usctl in (2.4), aiul c •- .( FI le;it| in 

(- n)(-7!. 4- 1) • ■ (-» -F m -■ 1') .. „ ,, 

{ — n — N){—n — i'/ + 1) - > • (—7i A + »! — I j 

l-n - A0(~n “ A' -F 1) ■ • ■ ( - A" + .rj 

Fmir ! 1, ~n, -n -- A' -F + 1. !)• 


1 - 


This equation may lie written from (LID) 


(4.12) ir()i.w,A4.r) = 1 


- y. r * \ /'’«,(•'■-F - 11,-11 -: 
/ N -F n\ 

Va -F i; 


,V4 .1 -F 1,1). 


For X = m — 1, and N = n, the e(nuili(m lieeunie,.j 


W{n,m,n,vi — 1) 


(”) 

= 1- )% 


n + m, 1) 


From (4 10) it follows that the second factor on llu' ripht suh' i.s ei[iial to the 
left side 


(") 

i,m, n, m - l) = - -2 


a + m, 1 ) 


Consequently 

(■^•13) lF()i, m, n, 711 — 1 ) = 1 

IJn = N, ilie viedian mmher .I'm of excccdanco’i over t/ic mlh Jarijral vnliir in m — 1, 
as stated previously The means, modes, and medians olitained from the exact 
formulae (3 11), (4 7) and (4,9) are traced in graph (2) for n = A’ = 9, and 
n = A = 10 



DrSTRIHUTION' OP EXCEEDANCES 


259 


6. Probabilities of at least one exceedance. If we sum up the probabilities 
w(t) from zero up to a e.ertaui x (or from a certain x up to N), we obtain the 
proliabilitioH IKfa;) (or P(x)) for at most (or at least) x exceedances over the 
mih past value in iV future trials 

(51) ir(.c) = 2 w(2); 1^(1) = X) w(2) 

4 — 0 

where 


ir(.T) + P(x - 1) = 1; W(x - 1) + P(z) = 1. 

The boundary condition.s are 

T7(()) = ia(0); Tr(iV) = 1; P(0) = 1; P(N) = w(lV). 

From the symmetry (1.4) it follows that the probability for the ?ath value from 
above to be oxcoeded at must r times is equal to the probability for the viih. 
value fiom below to be exceeded at least N — x times. 

From (5.1) and (1,11) it follou.s for ?« = 1 (and tn = n) that the probabilities 
for the largest (or snialle.st) among ?i observations to be exceeded at most once 
m n futuie trials converges louard d/d (or zero), re.spectu'ely. If n is large, the 
probability that (he largest I'ahu' nill be exceeded at most a times in n futuie 
trials is, by virtue' of (2,5), 

(5.2) ir(a, I, fj, :t) - 1 — (.1)* = P(n, n, n, n — x) 

mdependent of n. 

Consider now Ihc' jirolialiility that the 7al]i largest value will be exceeded at 
least oner in N future Irial.s 


/rn'i xr 11 I {N + 71 - 7ll)' 

(o.3) I {n, m, A , 1) 1 , 

= IF (a, n ~ m + 1, N,N — 1) 

If N and n are large, and in is small, thus cxpiession liceomes 


P{71, VI, l) — 1 



ir(«, n — ai + 1, ^ — 1) 


Foi VL = L and n — N, (lie prohaliility is .), independent of the size of n. 

Th(' least uumlier of C'xei'edanees over the smallest value for given probabili¬ 
ties P, called the lolcrancr limit, has been dorivod by S. S. Wilks [3] A related 
problem is the following- How many tiial.s N have to lie made in order that there 
is a given piobabiIi(.y a for (he rath largest value lu be exceeded at least once^ By 
virtue of (5.3) we obtain N from 


ti'(N + n — m)' 
(n — m}\{N -f n)\ 


(5.4) 



1,. J. rtrMlU.b \M> H. VfiK M'lIIXUXG 


2f.(l 


For the larnf'^f value vi 
(5.5) 


1, thi*' efjuatitm leafls to 


,V 1 
n 1 ~ « 


1 . 


Of eniU’.M*, N/n inereaK's uitfi «, If n is larRc, and m remains small, equation 
(5.4) leadH, in first iipitr<(xiraation, to 

(5.0) - (I - rt) - 1. 

n 

I’he quotients Nfn as fmiction of a. are Iraeetl in p;nqih (11). The quotient is 
plotted Vertically against 1/(1 — «) iilutted horizontally, both in logarithnnc 
scales. The absciisea shows the probability or. The curve for m ~ I i.s exact The 
eorreapontling curves for the penultimate and the, two preceding values 
(?n = 2, 3,4) are obtained from the approximation (5.5). The. graph reacts in the 
following way: Tlio probability thattbe largest, or second, or third, or fourth 
value from above are exceeded at leiust once in 1()0«, or (>«, or or 2.2a fu¬ 
ture trials is a = .99. Inversely, in 4u fiilute, trials (he probability that tlie larg¬ 
est, or the second, or the, third, or fourth extnune value is t'xcceded at lea.st oiieo 
is a = 0.80, or 0.90, or 0.992, or 0.9084, respectively. 

In a similar way wc calculate the prohabililies that the largest (and pemilti- 
mato) among n ob.servations is cxceedc'd at least twice in N futuri' trials Let 
02 be this probability. Then wc have for llic largest \-alu(* 

1 — 02 = 10 ( 71 , 1, N, 0) -f- w(n, 1, -V, 1) 
n-i-N\ ^n + N~ ij' 


For 71 sufficiently large, the expression simplifies to 



71 


The probability 02 as function of N/ti is also traced in Graph (3) and designated 
by m = 1, K = 2. Finally, for m = 2 the probability ai for the penultimate value 
to be exceeded at least twice is obtaineci for large n by 



This probability 02 is also traced in Graph 3 and designated by m = 2, a; = 2. 
If we fix the probabilities 02, the graph shows the number of future trials cor¬ 
responding to 1 and 2 exceedances over the largest, the penultimate, and the 
two preceding observations. 



99S 




m'/»- 


K, J, GUMIIKL AVI) 11, VON hCIlKU.IN'd 


6, Applications. In,")()% of all cahcs, tlu' l:iif;<'hl fur .Miiallisl) nf n inisl ohsoiva- 
lions will not (or ahvay.i) lie I'Mroili'tl in .V -■ n fiiliin* tiiaK Tho moan number 
of oxceedances Ls ilio moan m the Hi'inoiilli i:..'i(nbiilioii Th' nuimicr i,s' kTfjpsl 
for ik median, and mnalhlfur (hr ri(irmrft, ainl .Miporioiifv of iIk' oxtiomes 
inereases \utli the sample size. 

If the previous, and the fiiltiro .Minpli' bnlh air l.ii}!;i' ami oi^inl, llio dis¬ 
tribution of the number of oxooodanoo^ ii\oi (ho modiaii (ib.'^oivaiion i,i noimal 
with mean and variance of the order irl whereas llii' dislribution of the ox- 
ccedanCGS over the j/ith extremes (the law nf lare evei'edanei',^), .similai Lo the 


Poisson distribution, has the mean in, and Ihe \ai'ianee 2ni, in beiiii; small com¬ 
pared to the sample size Elementary ealenlalloms lead lo the settini; of sample 
sizes N corresponding to given piobabihties for 1 or 2 exiteitaiiee^ over the past 
largest and penultimate observation. 

These methods may be of mleie.st lor loreeustiiig Hoods if, iimtead of the size 
of the flood, we arc interested only in the trmiueiiey The saim* procedure may 
also be applied to other meleorologieal phenomena siieh ns tlroiigliis, ihe extreme 
temperatures (the killing frost), the large.sl piecipilatioiis, ole., and permits to 
forecast the number of cases surpashing a given si'verity w ithiii the next A' yeais 


ItEFEIlIiNClwS 

[1] F. J GuMBEii, "Simple tests for given liypotlicses," Bimclrda, Vol 32 (11)42) 

[2] H VON SciiELLiNG, “A foruiula for the partial suiiis of some hytiergeometiic senes," 

Annals of Malh. SlaL, Vol, 20 (1949), No. 1 

[3] S S, Wilks, "Statistical predictions with special icference to I lie problem of tolerance 

limits," Annals of Malh Slal, Vol 13 (1942). 



ON THE ASYMPTOTIC DISTRIBUTION OF THE SUM OF POWERS 
OF UNIT FREQUENCY DIFFERENCES' 

By Biudpoud F. Kimh\ll 
Nrw York State Department oj Public Service 

1. Summary. Since the "unit" froqnenev diffeionces (wee (2.2) below) are 
dependent, tlie n.snal inetliod.s for e.^tiilili.slniiR the noiinal character of the 
a.'iymptoiie di.sliibutKin of tlie sum of random variables fail 

Howevei, tlie c.ssenlia] cliaiacler of (lie distribution is disclosed by the integral 
functional lelation.ship (.'1 (i). From this it i.s po.s.sible to .show that for large 
samples the disfnbiilion ajiiiroxirnates ".stability" in the nomal sense ([2] and 
Lemma 21, 

Using the eonditioii (hat the thud logarithmic, derivative of the characteristic 
function IS uniformly boiuided for all « on a neighborhood of i = 0 one can 
prove that the asymptotic distiibution exists and is normal 

2. Introduction. Consider a one dimensional statistical universe characterized 
by a cumulative fieiiuency function {cdj) F(x) which is continuous. Consider 
an ordeied random sample a, of size N .such that 

(2 1) X , < x,.n , z = 1 to W — 1. 

Consider fioquency dilTpionce.s v, defined by 

Ui = F(xi), Un+i = 1 - Fixn), 

(2.2) 

u.+i = F(.r,+i) - F{x,), « = 1 to iV - 1. 

Thus 

(2 3) Z «. = L 

and the foimal integral of the probability density function [pdf) of the m, taken 
over the complete sample space of Xi can be written as 

(2 4) Nl j duidui • • ■ duh-iduh+i ■' du^+i = 1, 

where Vh is any n, which it is found convenient to omit, and the region of integra¬ 
tion IS the W-fold Euclidean space bounded by the coordinate hyperplanes 

u, = 0, i 7^ h, i = \,2, • ■ ■ N I, 

and the hyperplane 

(2.5) Hi Wa- 1 + Wft+i + - • • -j- ttiv+i = 1. 

(See _ 

^ This IS the socoiid paper in oonneclion with the subject aiinouncecl in Abstract hio. 9, 
Annals of Math Slat., Vol 17 (1946), p 602, and Abstract No. 331, Bull Am Math Soc , 
Vol. 52 (1946), p 827 Foi first paper, see [1] 

263 



HUAIU’OIU) F KIMH.VLI. 


:2(ii 

Cmisider ii test funcliun dcfineti by 

(2,r.) />>(', .Ur.v+ 1, 

where p is a veal positive number, ilf is mi integer le.-'-) tlian or efpial to A' + 1 
and such that if M < N 1 the which arc to be omit ted may be arbitrarily 
selected, but the subhcripts iudicatiiiji; llic order relation (2 2) are for the present 
retained. 

Consider the case \\here N i.s odd and M i.s even, and .set 
(2.7) N = 2;i d- 1, M - 2m. 

Divide the set of W + 1 frequency differences n, defined iiy (2 2) into two 
subsets such that each subset contains a + 1 difl’erencc.s of whi(‘h exactly m are 
included in the test function (2.0). Now let N lieenint' intinite over odd numbers 
Ni, Ni t • • . In other words the .sample size i.s lo increase without limit. For 
each sample size A'’, in such a sequenee. lot M, lie au even number such that 

(2 8) M, < A’,- + 1 

and such that the ratio M,/Nj is controlled for large values of N by 

(2.9) lim. UjNj ~ constant c, 0 < c < 1, 

As above for each step in the sequence the set of N, + 1 fiequency differences 
w, is divided into two .sub.sot.s of «, + 1 frequencies each with 

(2.10) N, = 271^ + 1, Mj = 2m ,, 

such that m, frequencies of each subset are included in the test function 

(2.11) y„, = Enf. 

Now we note that for a random sample of size A taken from the above universe, 
the characteristic function Gsit', ijm) may'be defined by 

(212) Gs{t\ vm) = A^! / du, dm • • • du,, 

taken over region m Euclidean space of N dimensions as indicated for the 
integral (2.4), taking index h equ.al to A’’ -|- 1. 

3. Proof of integral relationship—^Lemma 1. Eor simplicity of notation drop 
subscripts from M,, N,, n, and m,. We separate the lest fuiicLion ijm into two 
parts ijm and ijni such that 

(31) Vm ~ Pm + Vm' = + Sit?, VI — Vl' = M/2 

where the m frequency differences m y„ are those included in first subset and 
those contained in are those of the original M frequencies included in the 
second subset (see (2,101 and (2.11)) 



ON V\ A.SY.MPTOTIC DISTIUI3UT10N 


265 


The foiraal integral defining y^) may be written 

(3 2) Cr,{t, Vm) = r(2ii + 2) f e’“'" du, ■ • • du„+, f c''""' du,+, ■ ■ ■ ru 
lYlierc 


/I’l! ~ 2nd- 1 dinu'iisional JOuclidoan apace bounded by coordinate hyperplanes 
and jilane X]2„4.i n, === 1, 

= ti dimensional I'Juclidean apace bounded by the coordinate hyperplanes 
and the plane 


,3.3) 


Un+i + Un+Z + • ■ * + M5n+1 = I — W, 
W = tli + U2+ ■•■ + li^+i . 


Now introduce the transformation to 


(3 4) u'(l — w) = u,, f = n + 2, n + 3, ■ • ■ , 2?i + 1, 2?r + 2. 


Thus we have 


z ^ 1. 

nH 


and the n uv involved in the integration are bounded above by the hyperplane 
Zn vf = 1. The Jacobian is (1 — w)". 

Similarly under transformation 


(3.5) 


v,w == 14,, t = 1, 2, • - , n + 1, 

Z i'. “ 1 

n+1 


Let n,, i = 1, 2, • • ■ 71. and iv replace the remaining variables of integration. 
Thus the region of integration of these v, is Vi > 0 with the hyperplaneZn a. = 1 
furnishing the upper bound. The Jacobian of the transformation is w”. 

The regions of integration of these new variables ii. and n, are seen to be 
independent of each other and of w. Noting effect of above transformations on 
Pm and y,n', the integral (3.2) will be found to reduce to the following form; 

(3.6) Gyd; 7/if) = \ ^ f [ w"(l - w)”G„(tw’’; yJGn(i(L - wV; y,„) dw, 

I'(n + 1; Jo 

where 

N = 2n + 1, Af = 2m. 

Lemma 1. This f unctional relationship holds for all values of N and M subject 
to the condition that N be an odd integer and M an even integer. One may note that a 
similar integral functional relationship will hold for any partition (nifii) of the 
N — \ free frequency differences such that 

no + ni = N — 1, nio + mi = M, 

with corresponding changes in the Gamma functions which precede the integral. 



llIlMll'iMin K. KIMIHril, 


In order to lind oul what happens wlicii X becomes huge the partially noimal- 
ized lest function is introdueeil. This is defined by 

(3.7) - iuM - r/«)(^v + ly/VM, 

■where (cf. [1], formula (3.1)) 


iji, ~ E{iIm) 


MV(X -1- l)r(p -f 1) 
I'hV + 1 -1- V ) 


I have referred to zm tni a paitiaily normalized variable, sinee, 

= 0 , 

lim E( 2 h) = r{ 2 p + 1 ) - r-(p + 1 ) - cp“r'(p + 1 ), 

A—»eO 

■where this limit can be shown to be greater than zero for 

71 7 ^ 1 , 0 < c < ], 

(3 10 ) 

p = 1, 0 < c < 1. 

Recalling the, separation of the te.st function into two parts (see, (3.1)) we 
define p,„ and y,„f by 


2 /ih — ym> 


mr(n + l)r(p + 1 ) 
r(n + i + p) 


M = 2m, N = 2n + 1. 

From Stirling’s formula it can then be shown that 

(3.12) {N + IYUm/VM = (27V2)2[(n + lYyjV^] + o(l), 

where o(l) goes to zero as N and ikf become infinite subject to the condition 
(2 9). Thus if we define z„ and Zm- by 

(3.13) Zm = (?/m — ym){n + lY/’s/m, Zm< = (i/m' “ Pm')(^ "t“ l)^/\/in» 
since 

Vm = Vm + IJw 


it follows that 


(A'^ + IY/VM ~ (2’’/V2)(n + \Y/-\/m, 

= (27V?) [Zr. + Zm') + 0(1). 


Hence if we denote the characteristic function of the distribution of the 



ON AN ASYMPTOTIC DISTniBUHON 


267 


partially normalized test function by Gjf(l; Zu) and proceed ho develop an 
integral functional relationship similar to (3.G), one ainves at 

/'«''’(! - wrGM2wy/V2,z„;\ 

(3.15) ‘1 Jo 

■ Gn[t2’'{l — Z,n] dw 

with 

N ■= 2n +1, M = 2m 


4. Resulting functional relationship when iV becomes large. The .second 
lemma shows that the functional equation satisfied by the characteristic function 
of a normal distribution is approximated when N is large. Suppose we now set 

(4.1) w = (1 + s)/2, 1 — 1(1 = (1 — s)/2, div = c/s/2. 


Substituting in (3.15) we have 
e 


(4.2) Gn = 


Set 


r(2ft + 2) 


2'J«+i r5(_n 4- 1) 


/■-r* 


S^)''(rn[/(l + sy/-\/ 2 ',Z„] 

G„[/(l - 5 )Va/2, 2«1 


(4 3) //(/, s) = Gn[/(1 4- sY/y/2\ «„]f?„[/(l - s)7\/2; U 

Then 

( 4 . 4 ) H. = g:gmi + sy-W2 - Gr^Gnivi^ - sy~yv2. 


Using law of mean write 

(4.5) H{1, s) = Hil, 0) + sH.[t, his)], 0 < 1 his) j < s. 

Substituting in (4 2) we have 

(4.6) = Hit, 0) + I 

With JJ( 0 m) S 0, from the fact that the limiting variance of is bounded 
(see (3.9)) it follows that the first derivative of its characteristic function remains 
bounded in any finite interval, for all n ([3], p 90). Thus 

(4.7) I GUt; ^n.) 1 < 0 < I 1 1 < i>, for all w 

For case p ^ 1, by virtue of condition (4 /) H, will lemain bounded ovei 
interval of integration of (4 6) as iV becomes iiiLinile Let B denote such upper 
bound of the absolute value of Hs ■ Then, carrying out the inhegiation 

, ^ ^ iir(2a + 2) 1 

txbsoLiitG \ £J<lu.c of intfisrOil -£ 2 ^^^ |T f j ' 2 (ti -{“ 1) 


(4.8) 



DIUDIdlll) y, KIMItAI^L 


■JfiR 

f(ir any A-ahio of 1. Tliin (juantily upproaoho-, zoio a.s .V ^dc.s to infinity iiiiifomily 
for i on any liiiiti' raiiKo I’or tho oaM' that it < p < 1 a Mtiiilar argument may 
fte used by including the factor (1 - .s;'' ‘ whicdi ajipcar.s in II, in the integration, 
and placing thc“ upi>tT iioiind on the ali'-olutc value, of the factor G„Gn ■ 
Suhetituting buck for //(7, 0) in (.l.ll) one arnvea at 
Lemma 2. Thr chtirnrlrrixlir Juiidum yilinjifs /hv rdalionnhip 

(4.9) (hit-, Zm) = lCf,dl/v'9i -h 0(1), .V -- 2ii 'h 1, M - 2/n, 
v'Jurc o(l) goes lo zero with increasing ii, unijormhj jor t on any Jimic inierval 

(4.10) 0 < i 11 < />. 

The above lemma indicates that if the asymptotic pdf of Zm exists, it will be a 
“stable” distribution in the normal scn.se [2). In order to set the stage for proving 
the existence of this asymptotic distribution we shall iiist investigate, tho third 
logarithmic derivative of G'„(t; Sm). 

6. Investigation of third logarithmic derivative. Wi* .shall now show that the 
thud logaritliraic derivative of G is uniformly bounded in soiuo noighbothood of 
t = 0. We first prove that tlu* absolute value of the third derivative of G is 
bounded for all t and n. Now the third derivative will have absolute value Ic.ss 
than the third absolute moment which I ilcriotc by pj. ILsing Liapounoff’s 
inequality 

(5.1) 

pl < PiP* 

one asks whether the fourth moment pi remains limit! as n and in bceomc inliuite. 

Computation of the fourth moment about the mean appear.s to be somewhat 
fomidahle. However it is not so difficult lo show that it remains finite with 
increasing m and n. Referring to previous paper ([1] formulas (4.8)-(4.10)) 
we use quasi-moment generating function go{x) such that 

(5.2) <rga{Q)/dx = r(pr + 1), go(0) = 1, 

and it follows that 

(5 3) = (f[g'o(0)]'"/tJa;'’r(7i + l)/r(n 4- 1 + pr), 

VI 

and one recalls that 

y = 

with 

z = [(n + l)7Vjn][p - y]. 

The resultmg fourth moment of z will be in the form of a fourth degree poly¬ 
nomial in m whose coefficients are of the type 

(n-h l)^"r(n-l- 1) in + I)’" r(n + 1) 
r(n -b 1 -I- 4p) ’ r(„ 4 - 1 ^ > • ■ ■ > 




y = 


wr( n -b i)r(p -b 1) 
f(n -b- 1 p) 



ON AN ASYMPTOTIC DISTRIBnTIOIT 


2G9 


combinfid with tlio fust, niomont, with m ' iipyearing as a factor. By expansion of 
the Gamma function in asymptotic Hcrie.s in (ti + 1) it is not rhfficnlt to show 
that th(‘ cndlicu’iit. of m' tH'conip.s a.symptolK: like (a + 1)““, and that the 
coefficient of w' hoeomoK anymptolic like (n + l)-\ It follow.H that as n and m 
go to intinity with m (’{n t 1), that this fourth moment approaches a finite 
limit. lienee one eonclndf'.s that flie third derivative of G has bounded absolute 
value for all 7i and i 

Sinee the ahsolute value of the imit derivative of G is uniformly bounded fur 
finite I and all n it follow,s from the properties of a ehaiactcristic function that 
given a positive iiumher K In.'-s than unity, it i.s pos.sible to find a value of t = to 
greater than zero such that 

(5.4) 0 < /C < I G„{t, z) I < 1, 0 < I t I < to , 

for all n. 

From the above tlmible inepnahty and the fact that the ab.solute value.s of the 
first three derivatives art* uniformly bounded it follows that the third Utgarithmic 
derivaiii'c of G is iinijormlu honndul for all n on the interval 

(5.5) 0 < 1 11 < . 

6. Proof that the asymptotic distribution of 3 exists and is normal. Sinco 
absolute value of (/ is uniformly bounded away from zero on interval (6.5) one 
can write the functioniU relation (4.9) ns 

(G.l) Inft (r,v(/, Zu) =' 2 log GJi/y/'l, 2„) -1- o(l), 

where o(l) goes to zer<i with imweasing n uniformly for I on interval (6.5). 
Introduce the iiulalion: 

X(a) equals varianee of Zn , 

ij{t, n) eijual.s third logarithmic derivative of (?„(!, Zm), 
liU, -Y) equals remainder defined by 
(0.2) log GAl. zi,) . - -X(.Y)iV2 + }t{l, N), 

AVnte 

((i.G) log b'„(t/v''2, -h qUO/V'l, n)iy{yiV2), 0 < 0 < 1. 

Biibsliluting ({).9) and fli.21 in ((i.I) 

(0.4) lid, A‘) = [xrAG • ■ X(«)!/’'2 -b ll/V2]?(W/V2, n)t^/Q + o(l). 

By (3.9) 

(0.5) lim X(a) lini X(.Y) - po.sitive number X. 

We have proved that tlieie evisis an upper bound U such that 

(9 9) |(j(f,n) ! < C 



270 


BHAOrOUD F. KIMBALL 


for all n and for £ on interval 

(6.7) 0 < 1 < I < £o. 

Hence from (6.4) one can reason that given a positive e, a number Nn can bo 
found such that 

(6.8) \R{i,N)\<[l/V2\U\e/G\-\- , 

for all I on (6.7) and for N > Nu. 

By (6.1) 

(6.9) m, 2N+1) = [\C2N + 1) - X(N)]tV2 + 2R(£/V2, N) + o(l). 

Using (6.8) 

1 Ji;(£/V2. N) I < [l/V^U 1 £V(12\/2) 1 + e. 

Hence for any positive number a number TV's can be found such that 
1 Rii, N) 1 < (1/2)17 1 iVo 1 + 2t + , A7 > A7,, 

for all I on (6.7). After fc such opcration.s, taking e, = e 

(6.10) 1 B(£, N) 1 < (l/2)'»r; i £76 I + (2‘ - 1)«, N> Nx. 

Thus given a positive number d one can determine k such that 

{l/2)>‘'Wo/Q < d/2, 

and e such that 

27 < d/2, 

and therefore a number Nk+i such that 

(6.11) 1 R{i, N)\<d, N > 
for all £ on interval (6.7). 

It follows that Gw(£, Zu) converges uniformly to exp (—X£“/2) on interval (6.7). 
Convergence of Gki{t, Z}/) for a value i = ti outside the interval (6.7) may be 
proved by choosing integer fc such that 

(6-12) 0<l£,l/(v^*<to, 

and taking 

U = £i/(V^\ 

Recalling that the functional relation (4,9) holds for all finite i, this can be 
applied k times, thus building up h to ii 
It follows from the continuity theorem that the distribution function of Sm 
converges to the normal distribution function. 

7. Statement of theorem proved. The proof given above has involved the 
restriction that N be odd and M even (see (2 7)). This restriction is required 



ON AN ASYMPTOTIC DISTIlIBimON 


271 


for the integral relationship (3 6). However, if N were even one could take 
no = N/2 and TOj = tta - 1 and deal with, and Gn, m the integrand Also 
if M were odd, one could take = {M + l)/2, m, = mo - 1, and deal with 
Gn^il, Wfl) and G„,{1, mO in the integrand. This would of course carry with it 
corresponding cliange.s in the Gamma functions which precede the integral. 
As long as W(‘ require that 


iV = 7io -h rii + 1 , M = mo + wq , 

lini M/N = lim iih/no = lim nh/ni = c > 0, 

the arguments u.sed in arriving at the asymptotic relations (3.15) and (4.9) 
will apply Hence the theorem: 

Titisounm'. For a one dimensional stalistical universe whose cdf is continuous, 
consider the Junction of the unit Jrcquency differences u, 

(7.1) y 


taken from an ordered random sample of size n (see (2.2)) where p is any real 
positive number, and m is any positive integer less than or equal to n + 1. The 
selection of which m unit frequencies are to be included is arbitrary. Then with 


(7.2) 


y = E(y) 


mT(n + l)T(p + 1) 

r(n 4- p + 1) 


consider the partially normalized variable 

(n + 1)” 


(7.3) 


Vm 


iy - y). 


If n goes to infinity, with m becoming infinite so that 

(7.4) lim m/n = c > 0, 

then the asymptotic cumulative distribution of z exists and is normal, with 

(7.5) lim E(z^} = r(2p + 1) - + 1) - cp^T^(p + 1), 


except in the trivial case p — 1, m = n dr 1, in which case z = 0, and in the case 
p = 1, c = 1. 

REFERENCES 

[1] B. F. KimdalIj, “Some basic Iheoicms for developing tests of fit for the case of the 

uon-piii'ttinctric probability distribution, I", Annals of Math. Stat, Vol. 18 (1947), 
No, 4, pp, 540-548 

[2] P. Levy, ThCone de VAddition des Variables AUatoires, Gauthier-Villars, Pans, 1937, 

Chapter V. 

[3] H CiiAMfin, Mathematical Methods of Statistics, Princeton University Press, Princeton, 

1945, 

[4] P. A P Moran, “Random Division of an Interval”, Jour. Boy Stat. Assoc , Snypl, 

Vol. 9 (1947), pp. 92-98 

® For the case p = 2, m = n-l-l,aa interesting proof was published by P. A P Moran 
m 1947, see [41 



effect of OTEAR truncation on a multinormal POPULATION' 

Bt ’L W. Birndaum’ 

Univtrsity of 

1. Introduction. In adini^-’inn to cihioatintiul institutions. jK’r.'ionnel selection, 
te.sting of materials, and other pruelienl .sitiiation.s, the followinp; matliematical 
model is frofiuently eneoiinliwl: A (/; i- /'l-dimMisional random variable (Xi, 
Xa, • • • , A't, I'l j Fi, ■ ■ ■ , FB (X, y) i*- roiisiiicred, witli a joint probability- 
distribution a^Humed to be non-sin!j;ulur mulli-iuirinal. 'I'lie Ih , Fa , • • • , Fi are 
scores in admission tests, llie Xi , .V;, - • • , .Vi, seor('.< in acdiievemont tests. The 
admission testa are administered to all individmils in the (X, Y) population 
to decide on admission or rejection, and (usually at .snine later time) the achieve¬ 
ment tests are adrainisterod to those atlmitted A .set. of weights a, > 0, i = 
1, 2, ■ • • , I is used to define a (’onip(t.site admission te.st. .score, U a/Fy 

and a "cutting score” r is ehofaen so that an individual is admitted if 17 > t, 
and rejected if {/ < r. We will refer to thi.s pincednre a.s linear trunention of 
(X, Y) in Y to ihc scl V > 7. 

A linear Inmc.ation in Y clearly will eluinge (he absolute distribution of X, 
except in the case of iiidcpendonce. In this paper a study is nnade of the, absolute 
distribution of X after linear truncation in Y in the eiise /; = 1; in particular, 
the possiliility ia investigated of choosing the a, and t m such a way that the 
distribution of X after truncation has certain desirable. ]iroperties. The case 
fc > 1 leads to a considerable diversity of prohlems which are. being Btudied and, 
it 18 hoped, will be the subject of a separate paper. 

Throughout this paper it will be assumed that all the parameters of (X, Y), 
that is the expectations, variances and Covariances before tnmeation, are 
■ Known. In practical situations it often happens that only the parameters of 
Fi, Ih) • • , Fi before truncation are known, while the first and second moments 
involving Xi, Xj, ■ • • , X* are only known for the joint distribution after 
truncation. It can be shown [1] that in such situations the expectations, variances 
and covariances of (X, Y) before truncation can alwaj's be reconstructed if 
(X, Y) has a multinormal distribution. 

In the simplest case k = I = 1 the probability-density of the original bi¬ 
normal random vanable (X, F) may be, without loss of generality, assumed 
equal to 

(1.1) fix, Y, p) = e-ix.-2pzr+r>)/a(i-p»_ 

By truncating this distribution in Y to the set F > r one obtains the probability- 
density 

(1.2) 0 (Z, Y; p, t) = ^p-\r)f{X, Y; p), for Y > t, 

___ F < T, 

* Presented to the Institute of Mathematical Statistics on June 18, 1949. 

‘ Research done under the sponsorship of the Office of Navel Research, 

272 



UNi;\U THDNC.VTION 


273 


where 

0.) 

For further use we introduce tlu' ahhreviations 




(1.4) 


(Ih) 


ip(T) ~ 




_rJ;g 


AM = 

4 >\ t ) 


Wc also note the ineciualUies 

(1 f’') r < A(r) 

and 

X/'I + T" — 


(1 7) 


< 


derived in [2] and [3), respeetively.’ 

Before in' 0 (secdiB(f to the more-dunen&ional case, we will study some properties 
of the marginal piolnihilify-di.stribuUon of X after truncation to F > t 


( 1 . 8 ) 


‘pO^\ p,r) == ^ g{X, Y\p,t) dY. 


2. The moments of ^(X-,p, r). In this section all mathematical expectations 
iLrc computed for the absolute di.stribution of X after truncation of {X, Y) to 

— T. 

We have 


AX;p, t) = \f> ^(T)p(X)f > 


and hence 


E{X") ^ [ X’‘^{X,p,r) 

J—00 


dX 


, !’+«• Y yn- 

= r‘(r) I ~~ 


1-1 

60 'V^^TT •'(T—p 

- (^F#) [1. 

’ Implicitly, the inequality (1 6) was known already to Laplace, of. Mecanique Cile)> e, 
transl. hy Bowditch, Boston 1839, Vol. 4, p. 493. 


dS dX 



274 


Z. W. mUNHAUM 


_p f" 

'/'W'v/i — p' J- 


rZX" 


75? 

' (LY 
rrom Ihc identity 
( 2 . 0 ) 

we obtain 


-b 




I - 

Vi - P= 


fzr. 


«■(»'" (vi 'v) ■ (vi -7 



= Vl — P''p(t) 

and hence 

(2.1) E(Z’’) = ® + pX(t) £“ (.Vr- 7 = + prr-\(S) dS, 

for n > 1. 

For n = 1 this yields for the expectation of X after truncation 

(2.2) EiX) = pX(r). 

For n = 2 we have from (2.1) 

£(Y^) = 1 + pVx(r) = 1 + prEiX), 

and hence for the variance of X after truncation the expression 

(2.3) <t’(Z) = 1 + EiX)[pr - EiX)], 


r” (SVl - r + Pr)"“7(-S) dS, 

J—eo 


or 

(2.31) AX) = 1 - p'X(r)[X(T) - r]. 

From (2.2) we see that E(X} always has the sign of p, as one would expect. 
From (2.3) one finds a lower bound for t 


(2.4) 


E\X) - 1 
pE{X) 


From (2.31) and (1.6) one concludes that o-‘(X) < 1 for p 5 ^ 0, hence the 
variance of Ji after truncation is always less than the variance before truncation, 
except if p = 0. 

Similarly one computes from (2.1) the third moment about zero 


EiX') = E(X)[3 - p*(l - /)] 
and obtains for the third moment about the expectation 
(2 5) EIX ~ E(X)f = EiX)p^l[\ir) - r][2X(r) ~ t] - 1}. 



LINEAR CPRUNCATION 


275 


Numprical compiitation indicates that the quantity in braces is always >0, 
which would moan that the skewness of X after truncation has the same sign as 
E{X) and p No analytic proof of this statement has been obtained. 


3. Determination of t for given expectation or quantile ofX after truncation; 
dependence of this r on p . Let it be required to determine t so that the expectation 
of X after truncation assumes a given value m. It follows immediately from 
(2,2) that this t is obtained by solving the equation 


(3.1) X(r) = ^ 

P 

for T, which can be done with the aid of a table^ of X(r). 

Another problem which occurs in applications consists in determining t so 
that, for given 0 < ct < 1 and , the a-quantile for X after truncation assumes 
the value Xa , that is so that 

(3.2) r° <p(X; p, r) dX = rK-r) f^" f /(X, F; p) dV dX = a. 

«/_ oO J—co J’T 

Let 

(3.21) Pis, l-,p) = 2 'i rVf^° I I dY dX 

denote the volume of the probability solid Z = /(AT, Y ; p) above the quadrant 
X > s, Y > i Then (3 2) may be written m the form 

P(Xa ,T,p) _ _ 

or 

(3.3) (1 - a)\f/ir) = P{Xa , t; p), 


and this equation can be solved for t by trial with the aid of tables of i^(t) and 
Pearson’s tables [4] of P{s, i; p), 

Lemma 1. For fixed expectaUon of X after truncation E{X) = m, the solution t(p) 
of (3.1) IS a strictly decreasing function of the absolute value of p for Q < | p | < 1, 
Proof ; Differentiating m = p\(t) with regard to p one obtains 


and, in view of the identity 


the expression 
(3.4) 


0 = X(t) + pX'(r) ^ 

V(t) = X(t)[X(t) - t], 

dr _ _ 1 

dp p[X(r) — T-]' 


A table of 1 /X(t) is, for example, given m Karl Pearson, Tables for Statisticians and 
BiometriClans, Pari 11, 1931, pp. 11-15. 



Z. W. UIUNHATIM 


27f) 


From (3.4) and (1.0) we see lliat 

dr 

bi"n - = -SIRU P, 

which proves our lemma. 

Lemma 2. For fixed a, X „, the sohdion t ~ t(p) of (3.3) is a slriclly decreasing 
function o/ 1 p 1 /or 0 < I p I < 1. 

Proof: Differentiating (3.3) with regard to p one obtain-s 

y y dr dP dr I dP 

-<iip “ a? 


and hence 
(3.6) 


_ dP 

dr _ dp 

^ + (1 - aUr) 


From (3.21) one easily verifies that 

= V,(r)(l • 
dp 

and therefore 

dP(Xa, r, p) 


)"‘ r _ dt, 

’’(Xa-P'dVl-P’ 


(3.7) 

One also computes 


dp 


> 0 . 


dPjXg , r; p) 
3t 




/ Xg — pr \ 


so that the denominator of the right hand expression in (3 6) becomes 

fXg - PT 


‘Pir 


1 — a — V' 


Vl - 


In view of (3.3) this is equal to 


<p{r) 


'P{Xg,r,p) 


fiir) 


- i' 


Xg - 


Vl 


- Pr Y 


= \(t) 


[p(X.. r; p) - .Mt)* 

2ir Jt J (x.-pri/Vi-p' 

1 /“ 


dUdY 



LINEAR TRUNCATION 


277 


If p > 0, tlicn pY > pr in the interval of integration t < F < co, hence 

^ 1 =? < 2 ) therefore the integrand h{Y) is positive, and so is the 

denominator of (3.0). Rimilarly one sees that if p < 0 the integrand h(Y) is 
negative for r < F < « and the denominator of (3 6) is negative. In view of 
(3.7) we conclude 


dr 

dp 


-sign p for p 9 ^ 0, 


4. Linear truncation of {X, Fi, Fj, * • • , Fi) to the set XIj-i > t for 
given expectation or quantile of X, minimizing the rejected part of the population. 
Let (X, Yi ,¥ 2 , • - ■ , Fi) be an (f -f- l)'dinieiisional non-singular normal random 
variable with all expectations, variances and covariances known. We wish to 
choose ai, 02 , • ■ ■ , oi and t so that by setting 

(4.1) U = i,a,Y, 

and performing the linear truncation to the set Z7 > t we obtain for the expecta¬ 
tion of X after truncation a pre-aaaigned value m, and that this is achieved with 
the least waste of the original population, that is so that for the non-truncated 
probainlity-diatribution the probability P(]Cy-i < ■*■) minimum. 
Without loss of generality we may assume that, before truncation, we have 

(4 21) EiX) = B(Fi) = . •. = E(Yi) = 0, 

(4.22) c\X) = 1, 

and thus 

(4.3) EiU) = 0. 

Furthermore, the a, and r can always be multiplied by a constant, without 
changing the set of truncation, so that we have 

(4 4) AU) = 1. 

Theorem 1. To truncate {X, Fi, Fa, • • • , Fi) hnearly m Fi, Fa ,•••, Fi so 
that the expectation of X after truncation has the given value m and that the probahility 
of the rejected pent of the original population is minimuin, it is necessary and 
sujfficiont (1) to determine Oi, 02 , • • • , oi so that the absolute value of the correlation- 
coefficient p{X, U) becomes maximum under the condition (4.4), and (2) for U 
determined by these Ui, Oa, ■ * • , ui and for p — p(X, U) to solve equation (3.1) for r. 

The proof of this theorem follows immediately from the first paragraph of 
section 3 and Lemma 1. 

Using the second paragraph of section 3 and Lemma 2, one equallj'' easily 
arrives at the following theorem: 

Theorem 2. To tnincate {X, Fi, Fa, • • • , Fi) linearly in Yi, Fa, • • • , Yt 



278 


Z. W. BIRNHAITM 


SO that ilie a-qxianhlc of X after iruncatian has the given value Xa and that the 
■probahility of the rejected part of the original population is minimum, it is necessary 
and sufficient to satisfy (1) in Theorem 1 and than to solve equation (3.3) 

The problem of satisfying requirement (1) of Theorems 1 and 2 can be solved 
effeetively by a method due to Hotelling [5]. It may be worth noting that this 
method yields two sets of eonstanis, ai , , ai and — ai , — ■ , —ai 

both maximizing 1 p(Z, U) j but leading to values of p{X, U) with opposite 
signs Nevertheless the choice between Oi , Oi , ■ • - , ai and — «i , —Oz , • • • , -~ai 
and the determination of r are unique for any given m, since (3.1) has a solution 
for T only if sign p = sign m 


B. Linear truncation of (X, 7i, Fs, • • , Fi) to theset2"=“i“j^j' ^ forgiven 
expectation of X after truncation, minimizing the variance of X after tnmeation. 
It may be of practical interest to choose ai, ■ , ai and t so that, with 

the notations and under the assumptions of section 4, the expectation of X 
after truncation becomes equal to a given number m, and the variance after 
truncation is minimum. 

1’heorem 3. To truncate (X, Yi , V 2 , ■ ■ , Yi) linearly in Fi, Fa, Yi so 
that the expectation after truncation has the given value m and that, under this 
condition, the variance of X after truncation becomes as small as possible, it is 
necessary and sufficient to satisfy the conditions (1) and (2) of Theorem 1. 

The proof of this theorem follows from section 3 and the following lemma; 

Lemma. 3. For fixed E{X) = m, the variance a-^{X) after truncation is a strictly 
decreasing function of the absolute value of p for 0 < | p ] < 1 

Proof: According to (2.3) we have 

(7°(X) = 1 + m(pT — m). 

Differentiating with regard to p and usmg (3.4) we have 



For T < 0 this clearly is <0 For t > 0 inequality (1.7) yields 
t[A(t) - r] - 1 < i(TV4^» - 3 t* - 2) 

^ ^■'■(2 + t ) — 3 t ” — 2 ] = t(1 — t) — 1 , 
and this is < 0 foi r > 0. Together with (1.6), this proves that 


T[X(r) — t] — 1 
X(t) — T 


< 0 


for all T, and hence according to (3.1) 

d<r\X) 

sign —— = — sign m 


— sign p. 



I/INB-VIl TBTINCATION 


279 


ItmaybeconjectAiredtlmtthesiga of da^{X)/dp is opposite to that ofp alsoin the 
case when cr^(X) is the variance after truncation minimized under condition 
(3.3). This would lead to a theorem stating that the same choice of ai, oj, • • ■ ,ai 
and r which according to Theorem 2 makes the a-quantile after truncation 
equal to the given number and minimizes the rejected part of the original 
population, will also miiuraize the variance of X after truncation. 


REFJOIIENCES 

[1] Z W. BiiiNDAUM, E. Paui.son AND E C Andrews, “On the effect of selection performed 

on some coordiivnteH of a multi-diinensional population’’, Psyohomelrika, Vol, 16 
(1950) 

[2] R. D. Gordon, “Values of Mill’s ratio of area to bounding ordinate of the normal 

probability integral for large values of the argument”. Annals of Malh. Slat., 
Vol 12 (1041), pp. 304-300. 

[3] Z. W BiRNBAU.vr, “An inequality for Mill’s ratio”, Annals of Math. Slat., Vol. 13 (1942), 

pp. 246-246. 

[4] K. Pearson, Tables for Statisticians and Biometricians, Part II, 1st ed , Cambridge 

Univ, Press, 1931, Tables VIII and IX. 

[5] H. Hotellino, “Ilelationa between two sets of variates'', Biomelnka, Vol. 28 (1936), 

pp 321-377. 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


EXTENSION OF A THEOREM OF ELACKWELLi 
By E. W. Earvnklv 
University of California, Bcrhcley 

1. Introduction. In [1] (§1) the author has announced, as bearing on the 
results there, that Blackwell’s method [2] of uniformly imirroving the variance 
of an unbiased estimate by taking the conditional expectation with respect to a 
sufficient statistic, is in fact similarly effective on every absolute central moment 
of order s ^ 1. Our purpose heic is to establish this. In addition, the equality 
condition (null improvement of the moment) is piesented in terms of a primitive 
property of the estimate. The asserted uniform diminution of the s-th moments 
for a family W of distributions is, as in the case s = 2, a twice removed con¬ 
sequence of the fundamental fact for a single distribution that the absolute s-th 
power of the conditional expectation of a measuiable function is almost every¬ 
where (a.e.) not greater than the conditional expectation of the absolute s-th 
power of the function This is the .substance of the theorem below. The second 
corollary then states the result for unbia.scd estimates 

2. Preliminaries. Let Q be a space of points a;; 5; f*' tr-fu'ld of sub.sets of n, 
and ju, a probability measure on Let i be a function on 0, onto a space F of 
points t; a (r-field of subsets of F; andSi—a sub-cr-field of 55—the inverse of 
T’'’ under t A set in will be denoted by where A is its inverse under 1 . 
Let V denote the measure on defined by v(A^) = p(A). 

If / is a leal-valued,^ ^-measurable, p-integrable function on il, we denote by 
E(f j •) the conditional expectation of f with respect to t Corresponding to any 
particular function h on F (as, for example, E{f\ ■)) we define the function 
h* on by 

h*{x) = /i(r), i{x) = T. 

The qualification ‘‘essentially” prefixing a statement will mean that with the 
possible exception of a set of points of measure 0, that statement holds true. 

The following two simple lemmas enable us to present the conditions for 
equality, in the results below, in terms of the eleraentaiy characteristics of the 
function /. 


' This note was prepared under 0 N. B. contract 

^ With no changes in this note, and only minor changes in [1], the results we have set 
forth concerning unbiased estimation pertain as well to complex-valued functions. 

280 



ON THEOREM OF BliACKWBLL 


281 


Lemma 1. A necessary and svfficient condition that sgn j{x) = sgn E*(J\x) 
a e. in) is that sgn / he essentially a funchon of i. 

The necessity of the condition is clear. To prove sufficiency, let f be a function 
on n which is a e. equal to/, and such that sgn/' is an (unqualified) function of t. 
Now if sgn J'{x) = sgn E*{J ) x) does not hold a.e. (a), then there is a SE-set, A, 
of positive measure, such that, for example, for a; e A, f{x) > 0 while E*{f\x) g 
0. We then have the contradiction 

0 < [ /'dg = f fdy=^ f E*(fl .) dn g 0. 

Ja 

Lemma 2. A necessary and sufficient condition that fix) — E*(f \ x) a,e. (/i) 
is thatf he essentially a function of t. 

Again the necessity is obvious, To show sufficiency, let /' be a function on S2 
which IS a.e. equal to /, and is an (unqualified) function of t. Define A on P by 

hij) = fix), tix) = T, 

Then h* = /', and we have 

[ f dy = ff'dy=f hdv, A e iC. 

But this implies that 1 i(t) = Eif\ r) a.e. (v), and therefore/(*) = E*if\x) 
a.e. in), as was to be shown. 

3. Results. For a proof of the Holder inequality that we use in establishing the 
following theorem, we refer the reader to [3] (p. 233). 

Theorem,® Let s ^ 1 . Then for almost all iy)x, 

(1) I 

Equality holds a.e. 

(i) for s = 1 , if and only if sgn / is essentially a function of t; 

(ii) for s > 1, if and only if f is essentially a function of L 
Proof: Consider first the case s = 1. Let 

5 = {xe^\E*if]x) > 0), 

S' = ^- S. 

Then, for any A «SJI, 

f \E^if\-)ldy= f E*ifl-)dy- f E*iff)dy 

Ja Jsa Ja'A 

= [ fdy- ( fdy^ [ \f\dy = f E*i\f\\-)dy. 

JaA Js'A Ja *'.1 

® The proof we present here was suggested by the referee, and is much shorter than 
our own. 

^ For s = 1 this inequality was used by Doob in “Regularity properties of certain families 
of chance variables”, Trans. Amer. Math. Soe , Vol. 47 (1940), pp 455-486 (Theorem 0.2). 



282 


R. W. BAllANKIN 


Since A is arbikai'y, we have the lesult (1) wilJi s = 1. It is clear that the equality 
sign holds a.e. {n) if and only if, oxceiit possibly for a set of measure 0, / is positive 
on iS and non-positive on tS"; that is, if and only if sgn/(a;) = sgn E'^(J\x) a.e. 
{n). Applying Lemma 1, wc have the equality condition as stated in the theorem, 
Now let s > 1. To establish (1) it will .sullice, by virtue of what ha.s already 
been proved for s = 1, to consider / ^ 0 a.e. (a) We may then argue as follows. 
Unless (1) holds a.e., there 1.4 a '£-.set, R, of jiositivc measure, and numbers 
a >J6 ^ 0 such that for x e R, 

[R*(f\x)Y^a, 

and 


E*(r \x)^h 

But then, with an application of the Holder inequality we meet a contradiction. 
For, 


alum ^ E*U\ •) d/^}’ = {/^ 

^ [ fdfx- MR)r^ = [ E*(r I ■) dM • 

Jr Jr 

S h[n{R)]\ 

which contradicts a > b. Thus, (1) is proved in general. 

If f(x) = E*(f 1 x) a.e, (ji), it is readily proved by a direct argument that then 
equality holds in (1) a.e. (fi). Conversely, suppose equality in (1) holds a.e. 
Then we have, m fact, a.e., 

(2) 1B*(/Ia:)l = .B*(l/l|x), 
and 

(3) = Sni/l'la:). 

For brevity, denote the function E*(l/|l •) by v. Since/ vanishes at almost all 
points where v vanishes, we may write j / j = wv, where 

mix') =1^’ 

\l/(a;) l/aCx), u(x) > 0. 

(If V vanishes almost everywhere, we arc through.) For any Sl-measurable, 
real-valued function, u, on S2, we have 



ON THEOREM OP BLACKWELL 


283 


when cither of these integrals exists (cf. [4], p. 50, eq. (15)). Similarly, and 
taking account of the equality assumption (3) we have 


( 5 ) f u ■ v’ dfi — f 

Jn Ja 

In particular, consider the two functions 


u ■ v' • w' djx. 


and 


Ui { x ) = 


l/a(.r), 

0, 


v{x) > 0, 
y(s:) = 0, 


If 




iSo = {x « fi I ii(x) > 0), 

it IS seen that iq taken in conjunction with (4), and Uz taken in conjunction with 
(5), biing out 

/ 10(1/1= w' (1/1 = /a(So). 

J.so Jso 

From this it follows (eg, by the equality condition attending the Holder in¬ 
equality) that v}{x) = 1 ao, in ,S'o Hence |/(x) | = v{x) a.e. in fl. Therefore, 
by (2), 1/(1;) 1 = I E*{f I x) I a.e. But (2) also implies, as already shown, sgn/(x) = 
sgn E*(J\x) a.e. Thus, linally, wc have J{x) = | x) a.e, Now apply 

Lemma 2, and the proof of the theorem is complete 
Corollary 1. Let s > 1, and lei j/o denole the exj/eciation of f. Then 

((0 [ \E*(f\ .) - !7ordM ^ f \f~go\’dn. 

Jn Jr 

Equality holds 

(i) for s = l,%f and only if sgn [/ — po] '<s essentially a function of t, 

(ii) for s > 1 , if and only if f is essentially a function of i. 

This result expresses the domination over the s-th absolute central moment 
of the conditional expectation of / by the corresponding moment of / itself, It 
follows almost immediately from the theorem when we write (6) in the form 

( 7 ) f I EHf - 1/0 I •) r d/1 g f EHlf - ffn n ■) d/1. 

Ja Ja 

Thus, from the theorem we know that the integrand of the left-hand side of (7) 
is a G. ^ the, inlogrand on the right. Hence (7) holds. Equality in (7) holds then 
if and only if the integrands are ae. equal. The theorem therefore directly 
provides the equality conditions as stated. 

l,et TE — {/Us, e 0] be a family of probability measures on 5: and i, a sufficient 



284 


ElilZAHETII li. SCOTT 


sLatisUc for W (cf. [5], p. 232, §5). Let / be an unbiahod cHtimatc of the function 
on 0 Foi each mu ( W, the conditional expectation, E(i(J\ )> of / with respect 
to t is defined. Since conditional expectations arc fully deteiiiiiiied by conditional 
probabilities (although, in general, not tus usual integrals. Cf. [4], pp 48, 49; 
also [5], p 230) it follows from the sufficiency of t that there exists a function 
E(J I ■), on r, with Es(J | t) = E(f 1 r) a.e. (vo) for each 0 e 0 • E*(J 1 •) is again 
an unbiased estimate of g, and we have 

Corollary 2. Let t be a sufficient slalistic for the family IF — [/lo, 0 e©|; 
and /, an unbiased estimate of g. For s ^ 1, and each 0 e 0, 

[ I 1 ■) - g(0) 1' d/ifl ^ f 1/ - g(0) r dfio ■ 

Jq Jq 


Equality holds 

(i) for s = 1, if and only if sgn [f — ^(0)] is essentially (no) a function of t; 

(ii) for s > l,if and only if f is essentially (no) a function of i. 

REFEIIENCES 

[11 E W. Babankin, “Locally beat unbiased eatunatea”, Annala of Math Stat., Vol. 20 
(1949), pp 477-501. 

[2] D Blackwell, “Conditional expectation and unbiased sequential eatimation”. Annals 

of Math, Stat., Vol. 18 (1947), pp, 106-110 

[3] L. M. Gkavbs, The Theory of Functions of Real Variables, McGraw-Hill, 194G 

[4] A KoLMOGonoiE, GrundbegriEe der Wahrsoheinhchkeilsrechnuno , Ergebnisse der 

Mathematik, Vol. 2 (1933). 

[5] P. R Halmos and L. J. Savage, "Application of the Radon-Nikodym Theorem to the 

theory of suffleiont statistics”, Annals of Math. Slat , Vol. 20 (1949), pp, 225-241. 


NOTE ON CONSISTENT ESTIMATES OF THE LINEAR STRUCTURAL 
RELATION BETWEEN TWO VARIABLES^ 

By Elizabeth L Scott 
University of California, Berkeley 

1. Introduction. The purpose of this note is to present another case in which 
the structural linear relation between two observable random variables may be 
consistently estimated. Of the recent papers on this subject I wish to mention the 
paper by Wald [1], which contains a history of the work done on the problem, 
and the more recent paper by Housner and Brennan [2], Also relevant is the 
important result due to Reieis0l [3], [4] 

2. Statement of problem. Assume that the two observable random variables 
X and y have the structure 

' Paper prepared with partial support of the Office of Naval Research 

The results summarized were presented in a discussion held at the Cleveland Meeting 
of the Institute of Mathematical Statistics, December, 1948 



ESTIMATES OF STRUCTURAL HELVTION 


285 


f M 

[v = a + + V, 

where a and /3 aie unknown parameters to be estimated, and u and v are 
completely independent random variables. The latter two variables, inter¬ 
preted as the random errors of measurement, are assumed to vary normally 
about zero with unknown variances al and al , respectively. 

An increasing number n of completely independent pairs of simultaneous 
values of X and y are to be observed 

(2) (.T,, ?/.). f = 1, 2, • • • , n, 

so that each pair (a;, , iy,) corresponds to a value of the unobservable random 
vaiiahle ^ which is independent of the value of f corresponding to any other 
pair (xj , Vi), i 9^ j 

It IS well known that if the distribution of J is normal then the parameters 
a, /3, <s\ and ns unidentifiable. Reiers0l proved [4] that these parameters are 
identifiable in all other cases. Wald and Housner and Brennan found consistent 
estimates of these parametoi'.s assuming that, although the particular values of 
^ are not known exactly, a certain amount of knowledge concerning the values 
of I is available. The present note gives a method for obtaining a consistent 
estimate of which is the key to the problem of estimating the four parameters, 
for the case where it i.s known that a specified central moment of the distribution 
of ^ exists and dillers from that of the normal distribution. 

Since work on this subject continues, the present brief note deals particularly 
with the simplest case, when one of the odd central moments of £ exists and 
differs from the "normal’' value, zero. It will be observed that the hypotheses 
made here are of entirely different character from those adopted by other writers. 
The present note postulates knowledge concerning a moment of the distribution 
of whereas the papers quoted postulate some knowledge of the particular 
values assumed by ^ The method adopted was suggested by a remark made by 
Neyman [5] in 1936. 

3. Preliminary theorems. Let 

(3) T = i L-t,, 2/. = ^ 

71 iwal 71 T«il 

and let b be an arbitrary real number. 

Theorem 1: If ya , the thitd central moment of £, exists then the anthniebc 
mean 

(4) Fu,i(&) = - 'll lv^ — y, ~ 

n 


converges in probability to 

( 6 ) 


(j3 — 6)V3 • 



EDIZiVIiETH L. HfOTI' 




PiioOF. Simplo ulp;ol)ra gives 

F,M = (,d (^. - f)’ 

fl 1*^1 


+ 30^ - " 12 ((> — — u. - h(ii, - It )] 

tl t-l 

+ ^(fl -b)-12 (t. - Dlv, - V - ?;(«. ~ >i.)f 

n ,_i 

+ - 2 ]*'* — y — Hut — u.)]^ 

7\t |TSll 


It IS obvious that further 
the type 

(7) 


expansion will express Fn.iih) in terms of averages of 


71' tasL 


with p + q r ^3. Since all the terms over which each average is taken are 
completely independent, follow the same law and possess finite expectations, the 
familiar theorem of Khintchine assures that, as ?i is increased, each average ( 7 ) 
tends in probability to its expectation. Using Slutsky’s theorem (see Cram 6 r [ 6 ], 
p. 255), we conclude that Fn,i{h) tends in piobability to the limit obtained by 
replacing each average in the expansion ( 6 ) by its expectation and then letting 
n —> . The computations are easy and give 

( 8 ) lim pF = 03 - 5 )^ 3 . 

n-*« 

Q.E.D. 

Let denote a sequence of observable random variables (multivariate or 
not) such that the distribution function of Xn depend on the parameters 
6, with 0 ,’ < 0, < b,, i = 1,2, • • • , s. Eurthermore, let X denote a real variable 
and ji#in(X„ , X)} a sequence of functions of the arguments X„ and X defined for 
all possible values of X„ and for all values of X within the limits Oj ^ X ^ hi . 

Thbohem 2: If ihe sequence of functions {<i>n(Xn , X)} has the following properties: 

(i) whatever he the true values d'l , O 2 , o', of the parameters 6 , within the 

ImUs a, < 0, < b, , i = 1, 2, , s,asnis increased, the sequence \<j>n{Xn , X)} 

tends in probability to a function f(X, 0() of arguments X and d'l only. 

(ii) whatever be 5 > 0, there exist in (ai, bi) two numbers Xi and X 2 , each differing 
fron Bi by less than 5 and such that the product ffhi , 0%) /(X 2 , 0i) is negative, 

(iii) for every n and every possible value x„ of X„ , the function 4>n{Xn , X) is con¬ 
tinuous with respect to X /or ai ^ X S bj, 

then whatever be e > 0 and 77 > 0 thereexislsa number IVe,, suchihatfor n > JV,., 
the probability that the equation (/>n(X„ , X) == 0 has a root between 61 — e and 
01 + e exceeds 1 — i; 

Proof; Let e > 0 and tj > 0 be two arbitrarily small numbers. Let Xi and 
X 2 be two numbers such that X, e {a ,, bi) and j 0( - X. j < e, i - 1, 2, and such 



ESTIMATES OP STRUCTUIUL RELATION 


287 


that/(Xi ,0i)<O< /(Xj , 6 i) Select so large that forn. > JV,,, the probability 
of having simultaneously 

I 4>n(^n , X,) — /(X,, flj) j < 1- j/(X,, 9i) I for ^ = 1, 2 

differs from unity by less than ij, It is clear that if the inequalities (9) are satisfied 
for any particular value Xn of X„ , then 

(10) , Xi) < 0 < , Xi) 

and the continuity of , X) for X e (ui, bj) implies that there exists a number 
X(a:„) between Xi and X 2 such that ,X(a;„)) = 0 Obviously! b[ - X(a;„) j < t. 
Thus, whatever be e, > 0, there exists a number Nc,, such that the probability 
that , X) has a root in the interval (0i — e, + e) exceeds 1 — 7 ? pro¬ 

vided n > Nt,, . This proves Theorem 2. 

Theorem 2 is treated as a convenient lemma on which to base the proof of the 
existence of a consistent estimate of the parameter in (1). It is obvious, how¬ 
ever, that this Theorem has an independent interest of its own. 

4 . Consistent estimates of the stractural parameter j3. Referring to the general 
set-up of the problem of esl,imating the structural parameter (3 in (1) and using 
the notation (2) and (3), we prove the following theorems. 

Theoebm 3; IJ the third central moment m of J exists and difers from zero, then 
the equation 

(11) FnAh) = - E [y. - y - Hx, - x.)f = 0 

n .-1 

has a root t ichich is a consistent estimate of p. 

Proof: According to Theorem 1, whatever be b and jS, the stochastic limit of 
is (/? — b)V 3 and changes its sign as b passes through the value /3. Theorem 
2 implies then that whatever be €, p > 0, there exists a number iV,,, such that 
for n > Nt,„ the probability that at least one of the roots of (11) will lie within 
/3 — « and (3 + e is greater than 1 — p. This proves the theorem. 

Generally, let fim denote the central moment of f. 

Theorem 4:7/ iTie distribution of ^ has moments up to and including order 2m + 1 
and if at least one of the first m odd central moments / 124+1 differs from zero, h = 
1 ,2, ■ ■ • , m, then the equation 

(12) 7’„,„.(6) = E [j/, -y.- b(x, ~ a:,)r+‘ = 0 

n ,-i 

has a root 1 which is a consistent estimate of /3. 

Proof: The proof of Theorem 4 exactly follows the lines of that of Theorem 3. 
Using (1), (2) and (3), we write 

2m-hl 

F.Ab) = E CUiCd ~ bT 

I- E (^. - - V. - b(u, - ■ 

^71 ti=i J 


(13) 



KLIZAHKTII L. SCOIT 


‘JSS 

IL ih easily seen UiaL, as ?i. -> «), „(6) lendh in piohabilit.y to the hinil 

(14) KAh) ~ 

Tl-*W P 

where i/'(/3 - h) in a linear eombinatioii of even poweis of {H - h) wiLh at least 
one coclficicnt differeni fiom zero It, follows llial the sloeliiisiu' limit of 
changes its sign as h passes through /i and Ihe proof is eoinplidcd by referenee to 
Theoiem 2. 

Note that the stochastic limit of the lirst derivative of ^,,.^,(6), evalualod at 
& = d, is zero, which is imfortunato. Furthermore, the oi der of contact of „(6) 
at 6 = /3 increases with the order of the fust odd eential mouumt of f which 
differs from zero. Therefore, the precision of estimating d niay hi' e\[)ee,tcd to he 
heller when the low odd central moment.s are not zeio Without narrowing Ihe 
generality of the ease eonsideied, d. is ditticull, to make an evaiilation of the pre¬ 
cision of the estimates obtained. Thus, lor examph', the tainiliar method of 
evaluating the asymptotic variance reiiuiies llie knowledge of higher moments of 
^ than tho.se considered here For similar reasons, it is Ihus far impossible to 
speak of the relative efncioncy of the estimate.s found For this imrpose it would 
he necessary to deieiniine fust the measure of Ihe preei.sioii of the best estimate 
whose consisLency persists independently of the disl.iiliution of ^ provided only 
that at least one odd central moment diflers from zeio 
Once the consistent estimate 1 of d is oblamed, Ihere is no particular ddhculty 
in obtaining consistent estimates of the other parameteis 
.1 Neyman has poinlcd out [7] that Theorem 2 may b(‘ used as Ihe b.asis for 
a very elementary proof of Ihe consistency of maximum likelihood o.^imates 

REFERENCES 

[11 AnuxuAM Wald, “The fiUmg of straight lines if holh variables are .subject to ciror," 
Annals of Math. Slat, Vol, 11, (19t0), \) ISl 

[2] G W.Housnerand .1 P Hrennan, "Thocslimalion nf linear liend,s,’M)i?ials of Math 
SlaL,Yo\ 19 (1948), p 380. 

[31 Olav Reieiis0l, “Oontiuence analysns by nieiiiis of lag iiiniiie,iits and other methods of 
confluence analysis,” Economelnca, Vol. 9 {1911), p 1 
[4] Olav REiEnsdi, “Identiliability of almoin i(‘hilioii tielween v.inable.s whieli are sub¬ 
ject to eiror,” (.'onilcs Qommimon Ihhcvmoit Papas, EPUisUcs, No 3S7 (1949) 

15] J, Netman, Jour lioii Elal Eoo, Vol 100 (1937), p .IG 

[61 II MaUu'Milical Methods of EtalisUcs, Princeton University Press, 1940. 

[7] ,I Neyman, Pins/ Comse in ProMnlily and Sla/islirs, VdI 2, fortlicoiniiig 



MULTINOMIAL mSTIUI) UTIONsi 


280 


ON MULTINOMIAL DISTRIBUTIONS WITH LIMITED FREEDOM: 
A STOCHASTIC GENESIS OF PARETO’S AND PEARSON’S CURVES 


liy Mama Castellani 
University of Kansas City 


1. A multinomial law with limited freedom: Distribution functions of statis¬ 
tical equilibrium. Wc intend to consider here a convenient model of statistical 
mechanics, which by generalization of an approach used by Cantelti [1], shall 
give ns either Pareto’s or Pearson’s curves Let us imagine that N eleuieuic 
(V > I ~ 3) have to be randomly distiibutcd in a set of L continuous intervals 
t. (i = 1, 2, • , /:) in Ri , the “a piiori” probability associated with t. , being 

Pi , for Pi = 1. Assuming that the elements have no prefciences, they move 
freely uiidor the law of chance taking diffcient configurations (rii , Ui , ■ • , n,X 
with piobabilities P{ni , iii , - , ?u), n. being the number of elements placed 

in U and n. = K. Tdie random variable Y{1) lepresenthig the total number 
of configurations (rii , ih , ‘ , ru), therefore obeys a multinomial law with 

k — 1 degrees of fieedom, viz ; 

(0 . ,.i*l = n P’.“ , = N, 13 P. --- 1- 


71 ' 


We shall proceed to admit that the elements are not fiec, but that they have 
preference m the choice of a suitahlc interval. This fact we associate with the 
as.sumption that .some loici's of attiaction arc made to play m each interval. 
For the sake of sunplioily we shall consider that theie aie two independent 
forces, say p(0 and v{l), whoso convenient potential functions aic respectively 
fit) and ,piL), when' 


( 2 ; 



divO /,i 

vr - ~‘'® 


d'heso potential functions we may, lor instance, associate with the significance 
of ii certain quanta \s'ho.sc total is to be ilustributed among the elements and whose 
significance mn.st be c.stablislied by considciation of the paiLicular statistical 
expeliinent It is tlion admi.ssible, at lea.st in our fust appioach, to assume these 
potenlial quantitie.s to liavo a total constant magnitude, viz , ^ 7i,f(t,) = fii , 
n^ipit,) = 7/2 , where Ih and H 2 are appropiiatc constants This eoudiliou is 
analogous t,o the assumption in statistical mechanics of (he picseivation of 
energy. This analogy enahlo.s us to follow clas.sical methods We shall call our 
metliod the method of "intervals of energy.” Let us say that our system reaches 
i(.s canonic state ivlien .n) isamaximiim [2] When this .state, oecuis with 

a piobabihty close to the value one, we may say it is in statistieal ('quilihuum. 
It is well known that ,„*> reaches its maviinum when. 

(3) = 0 or 5 log Pi,ii,n..,. ,n,o = 0 



290 


MAKI'i. CASTELIANI 


Performing as usual, for oxaniplo. as in [3], \vc ultimately obtain: 


71. = iVp.f; 


—a—cv’(*i) 


"wbere a, Ii, and o me arbitrary eonslants. 

If N is sufficiently large, iijN may be considcied a inobabibty, and precisely 
the probability for Y{L) to score a, times the value, b when the, canonical slate 
is reached. The problem may llien be extended assuming that a continuous 
function may interpolate the discrel e, vahie-srii, Ji; , • , ?4 . Puttiing y = n,/N', 

assuming for the oakc ol simplicity that,/(t) and ip(i) along with their derivatives 
are continuous functions of i, and grouping tliese constants into a single K, 
formula (4) becomes: 


(5) 


'/ 




then: 


. , d log y _ldy , dfil) v^(t) ,. ,. 

® ST - Tjdi -dt'- ‘ liT - ~ 

Equation (G) is a generalization of the familiar Pearson differential eciuation 
which generates his system of curves. It is obvious tlial, (d) may determine a 
large set of frequency curves, depending on the- form of f{t) and 
The above analysis may be extended to any number of aelmg foiccs piovided 
they are less than fc — 1 in number 

2. Stochastic genesis of the Pareto and Pearson curves. We .shall ne.xt show 
how the Pareto and Pearson curves belong to this family of frequency curves 
The Pearsonian system of curves is derived by coinjiariiig its differential 
equation with (6) to determine in these the most .sintalile fiinetions for n{l) and 
v(t) Thus, 


(7) 


1 ^ _ < -p a 

y dt Pi Pit jSat" 


— 6ja(0 — Ci'fO 


Corresponding to the decomposition into partial fractions of the middle term, 
we have two sets of curves. When 


(^1 + ^2t + = Pi{t — 7i)(t — y.,) 

and 7 i , 72 are real numbers, then 



_ Ti + g _ 

^a(Ti - 72) (t - 7 i) 


i-7i’ 



72^1^0_ 

— 7i) it — 72 ) 



p and q being suitable grouping constants 
Under these assumptions two forces are acting in each "class of energy”, 
each one being proportional to the distance of the interval from some origin. 



MtJLTI.VOMrVL DlftTKIHUTION'S 


291 


Substituting (Sj into (G) and integrating, we obtain corresponding to the first of 
(5), after grouping (lie exponential constants into Af 

V - m - Y.)”(< - Y.)^ 


when' K, yi , , p, q also have the signirieauee of statistical eonsLants according 

to which we obtain Pearson’s curves of Type 1 or VI. 

When ft “ 0) we have by the same process; 


-vft) - i - 


— Cv(l) 


a ~ di/di ^ t?i 
ft + ftl 7^ d" ^ 


Hence, by grouping together the htalislical constants under K, p, qi , qi , we 
obtain; 


y - Kiv + 


which is a Peanson curve of Type III.' In each class interval two forces are acting; 
one is constant and the other i.s inversely proportional to the distance of the 
interval from a iixed origin 

We obtain a Pareto curve when in (8) eitlier p or q is siero. Under the indicated 
assumptions the Paieto mconio distribution curve appears in a new light. If the 
act ing forces are ii’diieed to one, and this one force is inversely proportional to the 
distance of the interval from some origin, the Pareto curve represents a special 
case of the Pearsonian curve 

In (7), we now consider the decomposition of the Pearson function for the case 
where Ihe denominator doc's not have real roots This decomposition may be 
indicated as follows: 


7 "b a _ / 4" ft/2ft 

ft 4- ft7 + ft 7- /3,{(7 -|- ft/2ft)' -(- (8t/ft — 

_ a — /32/2ft _ 

|9iT 0 + ft/2ft)' + ft/ft - ftVddfl • 


Setting 




= Vi, 


a 


ft 

2ft 


Vi, 


ft 

ft 



-?ig(7) = 


J + pi _ 

ft((7+ 7h)" + ff) ’ 


— cv(t) 


Vi _ 

ft[ (7 + pO" + q 


^ A L Bowlc'y tins found in Ids wclt-known analysis of food cxpendiLuies of urlian 
families, that the (hstnlmtion of weekly familv expenditures can lie best cxprc.ssed bv .i 
Pearson curve of Typo HI This us not surprising, since it is e.xaclly a case wheie we can 
assume the joint effect of a constant factor and another factor acting in inveise proportion 
to the interval (again m the sense of the distance fiom a suitable origin) The constant fac¬ 
tor 111 our case is the human need of food, while the factor acting inversely to the intei val 
can be taken as a response to prices Pee [4i 



MUU.\ CV.^’rKI.I.AN'I 


2\Yl 


These "wc may inteijnet. :i.-= fmees nf Ihe toman type By gioupiiifi, (he 
statistical constants appropiiritely under K, jh , q, no , iih , nh , we derive from 
(7) the following eiiuation. 

This is the familiar Peaison curve of Type IV. 

Other clistribiitions of the .*-anirs family eiui he easily I’oiiiul by the .saim’ method. 
3. The frequency curves and their statistical equilibrium. The conchi.sive .step 
in this analysis is m linding the probability of the most hkely conliguration. By 
generalizing a process of .statistical mechanic,s fust used by t’a-slelnuovo [ii], we 
assume any configuiation (iii , Hn , ■ , m.) slightly dilTeieiit from the ino.st 

piobablc {ni ,7h , ■ ■ ■ , iii) (the canonical eontigiinition). Setting 

n' = a; -r n. (i = 1, 2, • , Ic), 


we have by conditions (1) and (3); 


(9) 


k k I. 

13". = 0. 53 «./■((.) = 0, 53«.y((.) = 0 , 

1^1 1-1 1 1 

iv;...;. ...d - 


The sum of the valuc-s of .nd will give us the total probability of .scoring 

a rii slightly different from n,. Lei ms de.sigiuvU‘ by 11 Ibe total probability of 
having P(ndnd . ...d satisfying all above conditions. By following t'astelnnovo’.s 
method [2], [5], we obtain: 


H = 13 Pi..d4, 






53 


-1L 

i«j1 7lf 


We determine all integral sets of compatible rvitli (9); and with a condition of 
size 


1 2 

E-‘<2mo. 

(sbI TZi 


By a well-lmown process [2], [5] for any Uo 


n = 



J p«8 
0 




“ dn. 


This 13 the familiar Chi-sqiiare distribution function with (/; — 3) degi'ee.s of 
freedom By considering uo as increasing with N, we can conclude that 


lim U =’l. 


ihe state of maximum likelihood has a real significance only if it is almost certain 
that we will obtain either such a state or any one practically equivalent to it. 



T XllI \S.SKI) CH^IKCTER OF TESTS 


293 


This occurs when thp state of luaxiimini probability has little chance to change; 
it IS a Bo-called sUilionarn ,shitc or state of slatisiiccil CQUilihrutm It would mean a 
great deal if we could be abh* to say thiough how many states the statistical 
phenomena must, puss before attaining its equilibrium, nr m other words, whether 
the ergodie, hypothesis of the kinetic theory of gas applies to certain social or 
economic phenomena. We will not go further into this now, the results obtained 
here mu.st lie considered a.s an initial exploratory step, which does permit us, 
however, to end with the following eonclusive statement: 

If N element.s, provided jV is large enough, are distributed at random m 
Ic class “interval.s of energy”, it is highly probable that they will approach 
a coniiguratum of statistical equilibrium, a cli.stnbution of maximum prob¬ 
ability, Pareto’s and Peai.son’.s curves represent .special configurations of 
statistical equilibrium in a stochastic system. 


RIOI-’ERIONCEvS 

|1J F P. (bs nu.M, “biillc rlello IcRRi di frcqiicii/„'i ri.T considcrnzioni di probabi- 

hl!L,” Vclron, Vnl I (1020, N. 3 
[21 G, C\sTi';i,Niiovo, (lalroln dcllii FrohabihUi,Ronv\, 1919. 
im R B lajuiSAY, Inlro'luclwn lo Physical PlnUsltcs, New York, 1911 
I'll A L. Bowlkv, Klcmcnl>i uf Plaiisltcs, London, 1926 

[,5! J, L Cooi.iiKu:, An Inlrnduciwn lo Malhe.malical Probabihlii, Oxford, London, 1925, 


ON THE COMPLETELY UNBIASSED CHARACTER OF TESTS OF INDE¬ 
PENDENCE IN MULTIVARIATE NORMAL SYSTEMS 

By IL D. Narain 

Indian Council of AgncuUural Research 

1. Introductory. 'To prove the unbiassed character of likelihood ratio tests 
like the test of significance of the multiple correlation coefficient or Hotelling’s 
test, Daly [1] u.sc‘d the non-null frequency distributions of these test cnteiia. 
This leads to obvious difficulties when tackling the general regression problem 
and the test of nidepcndcnce of several sets of variates, and Daly [1] has shown 
only their locally unbia.s.se(l character. 

T’ is paper demonstrates an approach Avliich does not require an explicit 
knowledge of the frcciuency distribution of the test criteria and it has been 
possible to prove that the likelihood ratio test for the general regression prob¬ 
lem and the Wilks’ criterion for independence of sets of variates are completely 
unbiassed. The argument proceeds in a chain, the unbia.ssedness of the Wilks 
criterion following ultimately from the unbiassedness of the t-test. The Imlc up has 
been achieved by working with a chain of conditional distribution densities, a 
principle employed earlier by the author [3], [4] in presenting a unified distribu¬ 
tion theory of the common stati.sfical coefficients relevant to normal theory. 



It. 1). N’MtMV 


2. The t-test‘ Ah lltc wmiili'fit (It'mtm.Mialioii of the proccduie which ih tipph- 
cable generally, ctiiif'iticr tlic /-ti'sl for (he Mgnilicance of flic mean of a normal 
population. Lot the frcinicncy funelton of a .‘'Unpilc of .mkc ii Ih- 

(1) f2 ,rr)exp I ’ E fi. - mr . 

-* '-t 

The region IK — w complementary fo (he erilictil region ic for testing the hy¬ 
po the.si.s 

ni - 0 

is given by 

f < /.‘V. 

■where k i.s a positive coiiHtant (h'pendmg on (lit' .size of le and 

n 

n.c = E > 


x“ = E (■‘•i - •?)"• 

t 1 


Wn writ(^ 


'/■[£ 




f(x) d(x'), 


where 

/(x=) tf(x=) 

is the frequency function of x" which is distributed indeiiendently of x. To show 
that the test is completely unbiassed is cquivalimt to .showing that 

I{w) < 7(0) for all values of vi. 

We have 


^ = r 1 , 

dm Jo 


,-(n/2K)OCx+'")“ „-(.nriVi(Kx~r»)-\ jtJ’' 


Six) d(x) 


which IS positive or negative according as m is negative or positive. Therefore 

I{m) < 7(0). 

3. The E- and R- tests. Let the frequency function of n observations of a 
random variate be 

(3) (2W)^‘'« CM> r -t (*„ - f n ,n dx„. 

\ r W, I / J 1 

With the usual notation for partial variates in regro.ssion analysis, the critical 
region w based on the likelihood latio test for the hypothesis 


b dm dm+1 ' • ' = dp— 1 , 


m < p — 1, 



1’XIIIASSW) fllAnACTER OF TESTS 


295 


IK givpn by 


23 I7> (!!!•• J—I) 

1 — E -A j — < iv positive constant 

^ ip (lii m-] j 


It can he Khown 12|, j.'i] tiiat Ibis ratio can be expressecl in the form 

1 _ _- 

1 — — 7>-l 

r«sm 

where the frequency function of x" and the Zr is 

1 


(27rrr'"''' 


(4) 


- exp 


, — P + 

"or 1^’ + S (2r - Vr) 

( rui 


(x“) 


,2\((ll-fl-J)y2) ./S' 


d{x) H dzr 


The hypothesis to be tested then becomes 

0 ~ 2] in — limKl " * * * 1 • 

The region IF — w eoraplementary to w is given 

23 2^ < ^^X‘, 


where k is a positive constant determined by the size of w. Denote by 
, Vp-2 , - Vm) tiie integral of (4) over the region IF - w. Differentiating 
I with respect to 7 )p_i , performing the integration wuth respect to Zp-i and 
arguing exactly as in section 2 above we obtain 

, 7)p_.j , ■ • , rjm) < (0, Vp-i > Vp-3 • • ’?»)• 

Note that 2 p _2 is distributed independently of Zp-i ■ Therefore starting with 
»)p_i= 0 in (4) and considering the integration with lespect to Zp _2 first, we 
obtain as before /(O, tip-h i 7 m) ^ HO, 0, ■■ 7/,„) and thus finally 

HVp-i , Vp-3 , • ■ r/ni) < 7(0, 0, • • • 0), which proves the completely unbiassed 
character of the /i’"-test- The test of significance of the multiple correlation 
coefficient with any number of fhe predicting variates being fixed or random may 
be considered as a corollary to tlie above. We have only to multiply the frequency 
function (3) by a factor dF representing the frequency function of the random 
predicting variates (which need not be necessarily nonnal) This does not affect 
either the test criterion or the arguments showing its unbiassedness The test of 
significance of the multiple correlation coefficient is thus completely unbiassed. 



n. II, Nviniv 


4. The general regression problem. ('.ivi*u I ho disliilmtion, 

(27r)-'^" 1«' !"”■ oKp ! - \ E ' IE - E >ir> 

f f 1 h 

■ (/.' - E dri T,i) I X ri dr.r, 

f i,r 

(5) / - ], 2, • • • 7i, 

h -= 1,2. • u + 1,M'2, ••m, 

r, ,s == m -I- I, )/( + 2, • • • /), 
n > ]) > m > I, 

where the matrix |1 ;r,i, ji is nf rank ni 'I’lie liypnllio.sih // tn ho to.sLod i.s 

? ~ 7M + 1, w + 2, • ■ • p, 

dr.. = (1, 

)• = / + 1, [ + ‘2, ■ • • m 

The likelihood ratio lo.sl givc.s the critical region dolinod by 


X 7“"!' < a i)n.sit,ivo oonslaui, 

I I 

whero, -with llie iwuul le^ijir.ssKm notaiion for partial vauatL\s, 


Ctrl ** *C|r (12- (IJ. m) * 

laal 

fV 

drt ~ E 1) •'Hia (12. -I) • 

1*m1 


r, .S' = 7« + 1, m 4- 2, • ■ ■ p, 


Now we note that 


X = n (1 - El) = a - El) n (1 - El), 

r—m-hl rsai/n-f-l 


E •'C^r-(I2 I.i+l,I+2, ni,nit-l, r-1) 

l-El = _ „ _ . 

n 

^ j •^tr'(12' -IiiiH'l.wi l-Si if—1) 
tarsi 

Since the statistic X is invanant to liueav Imrisformatioiis of the larulom vanates 
iCm+i, a;mH 2 , ■ • j Xp the distribution (5) may lie simjiliried to 

n exp[-;i-E C'c.r-E(3r/,a:,7,)“ 11 . 

r=m+l L L 'S' r . k J > - 

I = 1, 2, ■ n, 

h - 1,2, • ■ m. 

Denote by f(dpu , , • dm+i.r) the integral of (7) over the region W — w 



TViir\.''f.i;i) <*ir\n.vfrrKU of tests 


297 


complcimTitary to (Uo ctitical region ir, whore 0rv in I stiinds for the entire aet 
of parameters fir.in , ■ dr„. . We may first integrate over a subregion of 

W ~ w over whieli Ur’ .'.ii (I — El) has a given value. Using identity (6) and 
tlie re.siilt of .section 2 it follow,s immediately tliat 

\i^v 11’ * * ' dnt 1 1 l*) ^ 7(0, ‘ ^tn+l,v)' 

If Ppv ~ 0, fhe distrihuliou of E], is independent of that of . Hence, startng 
with dpo - 0 in (7) and eon.sidering the integration for Z?p_i first, we obtain 

7(9) ^p—l.t) ) /Ip- 2 ,f ) * * ' /Im+l,v) ^ 7(0, 0, dp—2,11 j ' * f^m-\-l,v) • 

Thus finally 

KPpv , /Ip-l.v , • • • 0n,+l.v) < 7(0, 0, ■ • ■ 0), 
which proves the completely unbia-ssed character of the test. 


6, Test of independence of sets of variates. Consider n observations of q sets 
of random variates di.sLributed in the multivariate normal form 

Const X exp [-.1 E « ' {S - mr)(x„ - w.)}] 11 , 

4 


(8) ^ 1)2, • • • 71, 

r = 1, 2, • • • h ) h + 1) (i + 2, • • • , f2 + 1) ■ ’hi •■It, 

n > If 


Denote by D/ the detciminant of the sample dispersion matrix of the/*' set of 
variates and by D{j) the determinant of the dispcision matrix of the first j 
sets taken together. The Wilks’ statistic used for testing the independence of the 
q sets is given by 


(9) 


D(q) 

ilDx 


9—1 

J-2 


where 


7>0) 

D)D(i-l)’ 


; = 2, 3, 


The region W - w complementary to the critical region w is defined by 


Q- 


A > a positive constant 


The statistic W is invariant to linear transfoimations within each set of variates. 
The distribution (8) may therefore without lo.ss of generality be written in the 
form 


(10) n 


j«l Lr-lj-i+l 


it (2777-) 


2 \-(n/ 2 ) 



1,-1 


/ l.r 


dx. 



211H 


n. H. hi;T}i 


Let B, (j= 2, 3, ■ ■ ijj Maiui fur Iht' st'(, (if cfmstunts 

r = Ij-i + 1, Z;_i+ 2, • • ■ I,, 

P"‘i _ 1 <> . 7 

17 i, tj—1 , 

and let 

(11) li, =- 0 

imply the vamshiiiR of all iho conslanls of flio i-ot Bj . The q sols of vaviafes 
will be indepondpnl if (11) bolds for all values of j from 2 l,o q. Denote by 
/(i?o , B„-i , • , B-i) the mteKiul of (10) over the region H’’ — u’, IntcKraling 

first over the siib-regum of W — w for whieli 

<1-1 
II X. 

has a given value and vising the resull of seetion 1, it follows I hat 
/(/f, ,•••/!:)< 7(0, 71,, • ■ 7>’2). 

Also if Bg = 0, 'Kg is distributed indi'pendcnlly of X,_j . Hence starting with 
B,j = 0 in (10) and integrating for X,,. i first, w(> obtain 

7(0, 71,, 7J,.,, •. 77,) < 7(0, 0, 77,., , • • • 77,). 

Thus finally, 7(77, , 7i,_i , • 77,) < 7(0, 0, • • • 0), which proves the completely 

unbiassed cliaractei of the Wilk.s erileiion. 


Rt.l'MltKNCES 

U] J E. Da!< 7| “On dip unbiiiiccl charaotpr of likelihood ralio lests for iiid(>i)eii(leiu'u in 
normal system Aiinah of lilath Stal , Vol 11 (ItMtl), )), 1 

12] P C Tano, “The power function of iiniily.sis of vari.niice tests with tallies and illustr.L- 

tions of Choir use,” iSiuf lies .l/eHi.,Vol 2(l<Ktel,i) 121) 

13] R U Naiiain, “A now appioach to .samiilmg distnhul.uiii of the miiltivaiiatn normal 

theory. Part I," 7eur hul. tioc. Aqr .Slat., Vol l(vniSi,p 51). 

14] R D Nahain, “A now appioach to sampling dist'-ilmlinns of the multivatlatc noimal 

theoiy P.iit JJ,” ,/o.o IinL Hue Agi Hlai,Vo] 1 (Itl'lSh ]) 137, 


ON THE DISTRIBUTION OF THE TWO CLOSEST AMONG A SET OF 

THREE OBSERVATIONS' 

By G. 17. Sktii 

Jowa Blalc College. 

1. Introduction. In this note we obtain the joint distribution of the two closest 
observation x', x" {x' < x") of the set ri , a:,, .t, (.Ti< .r, < x,) when the dis¬ 
tribution of Xi ,X 2 , Xa IS given or can be obtained " We will assume that in general 
th e density fu nction is given by f(xi , r,, xf) and that it is continiiou.s in the 

‘ The results in this paper wcio presented at a meeting (if 1 he Institute of Mathematical 
Btatisticg in Madison, Wiscon.^iii, September 0, 194.S 

' The author’s attention w.aa drawn to this problem wliilc visiting the National Bureau 
of ih.and.irds m the Spiing of 1948, by Mr Julius Lieblein of the Statistical Engineering 



VV, .> CLOSKmT OII.SICIUATION.S 


299 


vaiiable.s niv( lived We also find the distributions of cnrtain statistics depending 
on .T.' and x". \Yt' will denofi' th(' deusily and the* cumulative distribution function 
of a normal eariate with mean zero and unit variance by ^(j;) and G(x) 

2. Distribution of the two closest. I.(*l x\ x" be the two closest among the set 
of .Ti , .(2 , .I'a (.ri< Xi < a*,]) Let , .Sh) denote the jirobabiUty that 

the events iS'i , jS'j , ■ , A'/ occur. ],e1 us eomsider P(x.' < s, x" < t), for t < s. 

For .s < I, it vcduc(*s to [‘(r" < 1 ) i.e, the marginal cumulative distribution of 
x" 

Now 

P{x' < S, x" < L) ~ P{xi < s, .Ta < l,Xi — .Ti < Xi — X'^ 

+ P{Xi < S, X3 < t, — Xi < Xi ~ Xi). 

The enuabties, here as well as elsewhere, are omitted as the variables admit 
continuous distributions. Lei, the lirst and second terms on the right side in 

(1) be denoted by i-’(.lj and P{B) respectively, where /I, B denote the events in 
the res]iective brackets The event B can be further split up into more ele¬ 
mentary events whose piobabilities can be easily found {B) can be seen to be 
equivalent to 

(x\ < 2s - L < Xj < , 

-f ('is — i < Xy < S, Xi < Xt < S , 

-b ^.'Ci < 2s - i, < X 2 < s, 

We may write (1) in the form of integrals and dilferentiatmg under the integral 
sign with respect to t and s wo obtain 

(2) dxs+ f f{zys,t)dxi 
dadi J 21 —J 

The light hand side of (2) gives the demsily function of x', x" at x' = s, x" = t. 
Let Ui{x^ , Cj) be the density function of Xi and x, {i > 3 = 1,2, 3) Then the 
density function p(a', -t") of x' and %" can be put into the form 

VW, i'") ==" /uG'k', x") [I -- F:iC2x'' - x' I a-i = x', Xi = x")J 

-h /2j(.i', X") [Fy(2x' - X" I X2 = X', 12 = x")], 

where F,{x, j x, = I, Xk = ?re) represents the cumulative distribution function 
of the conditional detisity function of x, when x, and x* aie fixed at the values I 
and '111 respectively If, before ordeimg, the three obseivaiions are mdependeut 

Laboratorr He undersLauds that Mr Liebleia has in prepaiation for submission to the 
Journal of lieaearch oj theMaUoiial Bui call of Standards a paper giving intensive uou.sulera- 
tion to the closest pan and othei aspects ol samples of three obseivations 


xj < X 3 < 2x2 — Xi^ 
Xo < X 3 < 2 X 2 — Xi) 

s 

X 2 < X,| < f 



i;. It M/in 


and from Ihe samp populafion haviiif; tliP donsily function f(x), thou (3) with 
the help of 

/(.Pi , -T.., r.j) - 

rodnecs to 

(4) p{x', x") = - F(2j" - x'l + F(2x' - r")] 


where Fix) = I fix) dx. 
J—QO 


3. Joint distribution of (.r" —x') and (jc" — x')/ix 3 — .Tih Lot Fiis, i) denote 
the cumulative distriliution function of u - x" ~ x' and w ^ . Then 

•I'n — .11 

(5) Fyi)),l) = r i" ~ X' < ,s -'—-•'I' < / . 

L Xi — .ii 

The range for u is (0, cc) and lo vanes hetweim 0 aiul .1, and tlius we limit our¬ 
selves to fi varying from 0 to cc, mul i varying in (0,1 ). 

After some manipulation of the prohahility statemenl. and differouliat.ing with 
respect to s and i under the integral sign, in a manner similar to that of the 
previous section, wo obtain the joint density function of u and v, given hy 


<fF](s,t) sF f" y j I j 


+ lA’^'' j) 


= Ms, i) (say). 


4. Applications to normal distributions. Let fix) in (4) be the density function 
of a normal distribution with mean 0 and variance uni ty, then (6) reduces to 

(7) = 

TTUI^ 3l0“ 

Further the marginal density of u and ly will be given by 

(8) 

(9) p{w) = —. 0 < w < ^, respectively 

The distribution of lo has been obtained by J. Licblein in an unpublished 
paper 

From (2) we can also obtain the joint density function of u = x" — x' and 



KHRATA 


301 


x' "I" . 

V = ‘—^ ‘ -■ When \vc inlt'grate this joint density function with respect to u, 

x' + x" 

we obtain tlio density function of ti = — as given by 


piv) ~ ()\/20['\/2(i’ — 0)1 


(16) 


1 + a 


(V2{v -J)\ 

\ Vrf ~) 

— 2 f 4 >{x')G 

Jo 


Sx 

^ + V - d) dx 


Vs 

The mean and the variance of the distribution of v are given by 6 and | + -V- 

47r 

respectively. 

It may be remarked that if there is a suspicion that one of the extreme observa¬ 
tions in a sample of three does not belong to the normal population under con¬ 
sideration, then the median of the sample is a better estimate than the average of 
the two closest. The efliciency of the latter compared to that of the former is 

about 70%, for the variance of the median in this case is given by 1 -|- 

TT 

vs 

compared to •) -{- ■; - of v, the average of the two closest. The efficiency is here 

•J-TT 

defined as the ratio of the variances for the two estimates. 


ERRATA 
By W. Feller 
Cornell University 

The author regrets the following inconsequential, but very disturbing, slips 
in his paper “On the Kolmogorov-Smirnov limit theorems for empirical distri¬ 
butions” (Annals of Math Stat, Vol. 19 (1948), pp. 177-189): 

(1) In equation (1 4) on p 178, the exponent — I'V should be replaced by 
— 2v^z. The same copying error occurs in the description of Smirnov’s table on 
p. 279. The proof is correct as it stands. 

(2) In the formulation of the coniinuily-theorem on p. 180 it is claimed that 
Ui, —> f{l) whereas in reality the continuity theorem permits only the conclusion 
that 

k /■t 

(*) Ur-)' / f(x) dx. 

r=l JO 

This slip in formulation in no way affects the proofs since only (*) is used. 
(The assertion that the step functions {fr} converge pomtwise is not based on a 



im 


OF I'AI’EIUS 


Second application of thn conlimiify thcorom, but, on the obvious fact that(+) 
irnpUea 



I dr, 


where the step function ff/r| converges iinifonnily to a continuous monotonic 
3(*))' 

Tlie following eorreclions apply to tin' paper, ‘‘On the normal approximation 
to the binomial (liatribution” {Amnln of Molh. Slat, Vol. lb, (1945), pp 319- 
329). 

(1) Equation (27) give.s two variants of an e.stirnate. for the error p. The, second 
should simply restate the Hast one m terin.s of tlie variable x, in other woids, 
the expression (p" + q^) in the second line of (27) should bo replaced by 
pTl - V^/oT^ -b f/(l + (/•r/<r)’. 

(2) The estimate p < ff'V'idb givi'ii in (28) is not valid over the entire range 
for which it is claimed. However, the further theory depends only on the fact 
that p = 0(tr~^), and the estimate, p < (r"®/,3() i.s both correct and sufficient for 
our purposes. (Actually, no changOvS what(>ver are required m the proofs, since 
(28) is used explicitly only fur a range where it. is correct as stated). 

(3) On p. 324 it i.s stated that under the conditions of the main theorem 
(p. 325) A; > 4, n — fc > 4, whereius in reality the value 3 can oeciir in extreme 
cases, Eortunatoly, the assertion is not used anywliere in the proof, and the 
error p is negligible in all cases. 

Accordingly, no changes are required either in the formulation or the proof of 
the theorems I am indebted t,o Dr \V, Hoeffding for calling my att,cntion to the 
slips. 

(4) The first ramus .sign in footnote 5 .slionld be an ec[uiility sign and the second 
minus in (70) a plus. 


ABSTRACTS OP PAPERS 

{Ahslracis of papers presented at the Chapel Ihll mccling of the Institute, March 17-18, 1950) 

1. A Method of Estimating the Parameters of an Autoregressive Time Series. 
S. G, Giiurye, University of North Carolina. 

The general autorogroRsive process of the second ordor is defined by the otpiiituiaR 

ri pi i 

Xt 4- aiXi-i 4- — Cl , 

where I, IS the value actually observed at time I, Xt the oorrespondmg theoretical value, 
H the disturbance and vi the superposed variation. The estimates of ai , aj given by Yule’s 
met od are biased and inconsistent if ni is not identically zero, the permanent bias being a 
uno .ion of the unknown variance of n, The present paper proposes a method of estimation 



AHSTRACTS OF PAPERS 


303 


which ia unalfcctRcl by tlip. pu'Sciicp of f,t , and scorns to be better than any other known 
method; and thia conjecture la supported by the results of application to observational and 
artificial scrica In tins incitliod the estimates a, , ai are obtained by minimizing 

n X 

I'jy _ O) I "b + OiXl+k-l + 

whcK' n IS some immlier siiiull in coiH|iariH(in with A'’ (wliieli ia the number of observa¬ 
tions). Ill llui aliove cxiiie.shioii llie usual apinoximation of .subatitiitmg {N - h - 2)ii foi 
a may be matlc fm compulatiimal convomcnco The mctliod has hoen used foi 
fitting aut(iregic.s.sive iiiocc.«sc.s to the senes of annual aveiages of Wolfci’s sunspot num¬ 
bers and tliat of IMyrdal'.s Sni'dish eo.st of living index numbers The method is applicable 
to biglier order i)roce.ase.4 

2. Most Powerful Rank Order Tests. (Preliminary Report). Wassily Hoeffding, 
University of North Cnrohna 

Lot A'li , • • , A'lni , ' 1 Xki, • , Xuk be inridom vaiiables with a joint probability 

function P{S) and let P|A',j = A',),| = 0 if ? /i (i = 1, , k). Let Ha be a hypothesis 

which implies that P((S') is invariant unde,r all peimutations of X^i, ■ , (i = 1, • , 

k) Lot 7,, 0 = 1. ■ ■ , n.) 1)0 tlie lauKs of A',i , • • , A',„. . Under IIo the M = n«,i rank 
permutations R = (I'li , , rm, , ■ , , rt„i) have the same probability P(/f) = 

iU"‘, A test which deiiouds only on the permutations R is called a rank order test (11 O.T ), 
A K, O.T, of size m/M which is most powoiful (M.P.) against a simple alternative, Fi{S), 
13 doteimined by in peiinutatioiLS R for which Pi(,R) takes on its m largest values. 

For example, lot the pairs (A'l , I'l , ••• , (X„ , y„) bo independent and identically 
distributed. Let J/o state that A'l , F, are independent, and let Hi(p) bo the hypothesis that 
A, , have a bivariate normal distribution with correlation p. We may assume that 
A’l < • < X„ and consider the ranks r, of the F’s only A R.O.T. which is uniformlyM P. 

against all //i(p) with p > 0 does not exist except for small n TheM P R 0 T against small 
p > 0 ia (lotermiiied by the laigo.st values of (FfA,) (EZr,), where EZt is the expectation 
of the i-tli order statistic in a sample of n from a standard normal distribuion The M P. 
unbiased R.O.T against small values of |p|is based on the statistic Si S, {EZ^Z,){EZr,Zr,) 
The M.P. II.0 T against p close to 1 is obtained by expanding the probability of (i'l , , 

7',,) m powers of ((1 — p)/(l -b p))*'* 

3. The Comparison of Percentages in Matched Samples. William G. Cochran, 
Johns Hopkins Univeisity. 

In this paper the familiar x' test for comparing the percentages of 80006836.1 in a numbei of 
independent samples Is extended to the situation in which oaoh member of any sample is 
matched in some way with a member of every other sample Tins problem has been enooun- 
Icredin the Helds of jisyohology, pliaimacology, baetoiiology, and sample suivey design 
A solution lias been given by McNomar (1919) when there aio only two samples 

In the more general ca.se, the data are arranged in a two-way table with r rows and I 
colunin.s, m wbieh caeh column represents a sample and each row a matched gioup The 
test criteiioii projiased is 

c(c - Dzir, - TP 
^ ~ ^c{Su.) - (Sa() ’ 

where T, is tlie total number ot successes in the j*** sample and )ii the total number ot suo- 
ces.ses m the 1 ““ row If the tiue probability of success is the same in all samples, the limit- 



3CU 


AUbTUVCTf) Oi' PAPEUS 


iiig (liHLrilmlion rif y, when till' miinbcr of rioss ifi iiirge, is t,hi‘ x” distribution wltli (c — 1) 
degrees of freedom, Tiie relation beiiveen this teat and tho oiilinary x“ teat, valid when 
samples are independent, is diaeusHotl 

In small sainples t.he ONael dial nbution of Q can be eonstriicted by regarding the row 
totals as fixed, and liy asauining that on the null livpotliesia every eoluinn iB equally likely 
to obtain one of the aueee.sses ui a row This exaet diatrihution is worked out for eight 
example.^ in order to test the accuracy of the x" approxiinalion to the di.stnbution of Q in 
small saniple.s. The numlier of samples ranged from c = 3 to c = j. The average error in the 
oatimation of a signilieiuiee iirohability was about ll per cent in the neighborhood of the 
5 per cent level and about 21 per cent in the iieighhorhnod of llie 1 per cent level Correction 
for continuity did not improve the aceiiraey of the ap|iro\iination, altliough it is recom¬ 
mended when there arc only two sainples. Another approximation, obtained by scoring each 
success as “1” and each failure as “0” and pcrfoimiiig an analysis of variance on the data, 
was also investigated. The ff-test, corrected for continuity, performed about as well as the 
X® approximation (uncorrectod), but is slightly more laborious 

The problem of subdividing x* into components for more, detailed tests is brielly dis¬ 
cussed. 

4, A Method of Estimating Components of Variance in Disproportionate Num¬ 
bers. H. L Luoas, North Carolina State College. 

By including sufficient effects in tho forward solution of the Abbieviated Doolittle 
method, components of vaiianco may be estimated from dispropoiLionato data. The pro- 
cediiie is vciy systematic, and thus, is adaptable to routine computational work, Thu 
computations will be described, and tlic utility of the method biiclly discussed. 

5. On the Theory of Unbiased Tests of Simple Statistical Hypotheses Specifying 
the Values of Two Parameters. (Preliminary Report). Stanley L. Isaacson, 
Columbia University. 

In the Neyman-Pearson theory of testing simple hypotheses, in t he ono-parametor case, 
a locally best unbiased region is called “type A " Itis obtained by maximizing the curvature 
of the power curve at the point 8=6^ specified by the hypothesis, subject to the conditions 
of size and unbiasedness For the two-parameter case, Neyman and Pearson considered 
“type C” regions {Siat. Res Mem., vol. 2 (1938), p 36), The definition of these regions 
requires one to choose m advance a family of ellipses of constant power in an infinitesimal 
neighborhood of the point {0, , 82 ) = (ej , 8 ^) specified by the hypothesis The natural 
generalization of a “type A” legion is a "type D” region, which maximizes the Gaussian 
curvature of tho pov.fer surface at, (8$ , Ol), subject to the conditions of size and unbiased¬ 
ness. This definition does not require one to choose a family of ellipses in advance This 
approach leads to a new problem 111 the calculus of variations A sufficient condition is 
obtained which plays the role of the Neyman-Pearson fundamental lemma in the “type A” 
case An illustrative example is given. (Prepared under sponsorship of the Office of Naval 
Research ) 

6 A Note on Orthogonal Arrays. Haj Chandra Bose, University of North 
Carolina. 

Consider a matrix A <= (o,,) with N rows and ?n columns, each element a,, standing for 
one of the s integers 0,1, 2, , s — 1 Let us take the partial matrix obtained by choosing 

any i < m columns of A Each row now consists of an oidered i-plet of numbers, and each 



ABSTRACTS OF PAPERS 


305 


clomont lias onu of s poss,il)le values, there ate possible i-plets The matrix A may be 
called an. oithogonal array (N,jii, s, i) of size J\f, m constraints, s levels and strength t, if by 
choosing any L columns whatsoever every possible <-plet occurs the same number of times 
Clearly jV = Xs' where X is an integer Such arrays have been considered by Rao and are 
useful for vanoiis experimental designs The existence of an orthogonal array (s* M, s, 2) is 
equivalent to the existence of a set of orthogonal Latin squares of side s and m constraints 
(i e , the number of Latin squaies in the sot is m — 2) The fundamental question that can 
be asked regarding orthogonal arrays is the following' What is the maximum number of 
constraints for an orthogonal array, given JV, s and it Denote this number by /(JV, s, i), 
then from known properties of Latin squares /(s®, s, 2) = s +1, if s is a prime or a prime 
power, and a theorem by Mann states that f(s-, s, 2) = , + 1, li s = p’l'- ■ • • , where 

Pi , • , pit are different primes, and 7 is the minimum of po" , pi' p**= . The following 
generalisation of Mann’s theorem is proved in this note 

JiNiNi • h/T ,sisa s* ,0 = Min|/(.Vi ,siO,/(JV 2 , Si ,0, ,/(JVa , s* , t)). 

7 Transformations Related to the Angular and the Square Root. Murray F. 
Freeman- and John W Tukey, Princeton University 

The use of transfoimationa to stabilize the variance of binomial or Poisson data is 
familiar (Anscombe, Bartlett, Curtiss, Bisenhart). The comparison of transformed binomial 
or Poisson data with percentage points of the normal distribution to make approximate 
significance tesl.s or to sot approximate confidence intervals is less famihai Mosteller and 
Tultoy have leoently made a graphical application of a transformation related to the 
square-root transformation for such purposes, where the use of "binomial probability 
paper” avoids all computation We report here on an empirical study of a number of ap- 
pro.ximatioiis, some intended for significance and confidence work, and others for variance 
stabilization. (Propaicd in connection with research sponsored by the Office of Naval 
Research) 

8 Standard Inverse Matrices for Fitting Polynomials. F. J. Veemnden, North 
Carolina State College. 

For fitting polynomials of the type, y = boxP bix -|- bzx^ + • -fi b,i,x"', with the a:’s 
equally spaced, published tables of orthogonal polynomials may be used This procedure 
does not yield the b’s directly, nor their variances or covariances, although such may be 
obtained by proper computations which are moderately tedious. Insome types of statistical 
work, the 6’s and then variances and covauances may be desired, These may of course he 
obtained directly by the method of least squares but the computational work is prodigious 
relative to that for the orthogonal polynomial approach When the s’s are equally spaced 
the elements o,f the variance-covanance matiix may be put in the simple form of sums of 
poweis (including the zero power) of successive integers fiom zero to n {n equals one less 
than the numbei of observations) The elements of the inverses of matrices of this type 
have been worked out algebraically m terms of n for polynomials up to and including the 
quintic (m = 6). With these standard invcisc matrices, the 6’s and then vanances and co¬ 
variances may quickly be obtained once the elements are evaluated numerically These 
elements have been evaluated numerically up to ti = 20. 

9 . Mathematical Models in Biology. J. A. Rafferty, Department of Biometrics, 
School of Aviation Medicmc, Randolph Field, Texas. 

From tlic point of view of a bio-medical icsearch administrator, mathematical models 



AllRTllVCTS OF rXl’IORK 


:W(i 

vull !i nrcal(‘r ir»lt' in liiolnciciil nwarch flnin liPrnlnforp. In anticipation of this 

trciiii, c'crtiiin jiliilomiiiliical iniplicatnin.s of nuKlcls in liinlnRical theory and scientific theory- 
in liiHtory are examiiud A liierarcliy of nlistriieiion-levels in inolopy is delineated, and the 
loll! of inivtheniatical niodels at tliese levels is illustrateil hy eviimples fioiu the literature 
I’ropnsals are made for a eonrentraliim of inathematicid (‘iTort on eeilain important bio¬ 
logical probleins. Ileinaiks are iiitide on the e.ipaliilities and liinitivtionfl of models in biology. 

10. Small Sample Performance of Biological Statistics. Iuwin Brokh, Johns 
IIopkiuH ITnivo,r.sity. 

Ill this paper the dilution method for r.stiiiiaiiog baelerial den.sity is investigated by an 
exact small sample method and also hy an niipioMinate one. Metliodologies and design of 
experiments aic compared for various small .saiiiple causes 

11, Methodology in the Study of Physical Measurements of School Children. 
B, G. Greenberg and A. Hughes Buya.n, Uiiiver.siLy of North Carolina. 

In a series of investigations to deleiminc hy small-sampling toohmciuc what physical 
diffcrcncra, if any, occur between children of dilToringaocio-uconmmn backgrounds, several 
problems of methodology luosc. A pilot study was undertaken to assuic maximum cfliciency 
at each stop This paper reports some of these resulta.l It was found that the children could 
remain dressed (with the exception of boys’ bi-iliac measurement) without changing the 
magnitude of the chfrereiices The pdol study enabled us to decide how many observers to 
use, and how much duplication of measurements by them was necessary Minimum sample 
sues were estimated to mdioatc physical ihfferonccs of predetermined magnitudes. It was 
found that the ago grouping 96-143 mouths was optimal fiom the standpoint of indicating 
physical diffotencoa between, children of differing socio-economic levels. Boys and girls in 
the upper socio-economic levels were both taller and heavier for tlicir age in this age group. 
There were no -weight diHerenooa, however, when -weight was adjusted fot age and height. 
Measurement of the bi-iliac and transverse chest diameter provided hltlo additional in¬ 
formation on physical differences, The calf circumference, an indicator of muscle mass and 
subcutaneous fat, is suggested as being a sensitive supplementary index to indicate physi¬ 
cal differences when age and height are adjusted. 

12 Tetrad Analysis in Yeast. A. S. Househoudbr, Oak Ridge National Labora¬ 
tory, Oak Ridge, Tennessee. 

In neurospora all four products of meiosis are recovered in the four spores of an oscus 
In crosses AB X ab the asci are of three types, designated I, II or III according as all four, 
none, or two spores resemble parents Frequencies of these types, P, P' and P" are the 
obseivables If there were no exchange P" would be zero; and one should have P' = 0 
or I according to whether the loci wore on the same or diflercnt chromosomes. 

Assuming only that no exchange occurs between sister chromatids and neglecting ehio- 
inatid interference, one can calculate without further assumptions a frequency P" of 
exchanges between a single locus and its centromere from data on three or more genes taken 
in pairs by equations 

s., = So,So, , P" =» 2(1 - s)/3, 

where the subscript 0 refers to a centromere Lindegren makes such calculations fiom his 
own data, by taking groups of three, but makes no effort to reconcile discrepancies Ney- 
man’s modified chi-square, however, permits combining all observations in a set of equa- 



ABSTRACTS OF PAPERS 


307 


tioiLs that yields easily to rapidly converging iterative solution. The equations are 


2s. S s5(n„ + n.',)“(«->= 2 + <,)=(2ar,‘- nir*), 

jr> iyi( 


wheie 7i,j IK tile iiuinber in elasK I and 11 combined for the loci i and j. nlj the number 
class m III, and only tliriKe jiaiia (i, j) are included which arc found to be independent 
The argumeiil. of A H. (J. Owen (I’roe. Roy. A'oe , Sct B, Vol 136 (1949) pp 67-94 ) 
can be paraphnused for (die inesent ease and iisuitable generating function P(X, u) is being 
sought jirovuling a luetrie. The speeific one proposed by Owen is ruled out since s = 
P{—i,u) lakes on a iiegalive value for one loeiis, which is not possible with Owen’s function. 


13. Contribution to the Probabilistic Theory of Neural Nets. I. Randomizatioa 
of Refractory Periods and of Stimulus Intervals. Anatol Rapopoet, University 
of Chicago. 

Aggregates of neurons are considered m wiiich the frequency of occurrence of neurons 
with a specified value of the refractory period follows certain probability distributions 
Input-outpul func'tioii.s are derived from such aggregates In particular, if input and output 
intensities an; defined in lernis of stimulus frequencies and firing frequencies per neuron 
rcspeetively, it is .sliow'ii that a rectangular distribution of refractory periods leads to a 
loganlhniio input-output curve. If input and output are defined in terms of the total 
iiumbei of stimuli and firings in the aggregate, it is shown how the “mobilization” picture 
leads to the logaritliniie input-output curve. 

By randorniziiig the intervals lietwcen stimuli received by a single neuron and by intro¬ 
ducing an inhibitory neuron a veiy simple “filler not” can be constructed whoso output 
will be sensitive to a particular range of the input, and this range can bo made arbitrarily 
small, 


14. Theoretical and Experimental Aspects in the Removal of Air-Borne Matter 
by the Human Respiratory Tract, IL D. Landahl, University of Chicago 

The pnncipal faclois governing the fate of a particle in the respiratory tract are impac¬ 
tion due to inertia, settling due to gravity and Brownian movements. For a given respira¬ 
tory pattern, it is possible to calculate the probable fate of a particle from a knowledge of 
tho geometry of the passages. Tliesc calculations have been carried out in such a mannei as 
to obtain the theoretical amounts of material deposited in various regions of the lungs as 
well as the relative amounts in various fractions of the expired air Similarly, it is possible 
to estimate the probable fate of a particle which passes through the nasal passages. Ex¬ 
periments have been carried out to verify a number of these predictions On the whole, the 
agreement, as illustrated in tho slides, is fairly satisfactory when one considers the com¬ 
plexity of the calculations. 

15. An Application of Biometrics to Zoological Classification, F. M. Wadlby, 
Navy Department, Washington, D. C. 

Statistical problems in taxonomy arc discussed, attention must be paid to variation of 
individuals as well as of group means, Covariance analysis and the discriminant function 
technique are applied to multiple measurements in groups of mollusoan fossils 

16. The Analysis of Hemotological Effects of Chronic Low-Level Radiation. 
Jack Moshman, United States Atomic Energy Commission, Oak Ridge, Ten¬ 
nessee. 



308 


AB.STHACTfe OF PAPEKS 


Revuial motlKKis nvc; fui aiialyKiun (he possililr elf(>(!t.s of clu’onic low-level 

inadiatioii upon the eiuphiieen ol the opeiaUnK contuictoi.s of the US AEG. The effects 
iuvcsti(];ated mo tliosci on the vod Itliiod count, liumoslohiii, white lilood count, lymphocytes 
and neutrophils. The iLualyaus iiicludoh iiieiisuremcnts of siKinficiint ddfeieacc.s among 
iiidividualb, geographic sites and tlie e'cploiiition of viiiiou.s indiec.s of c'lposuic to ladiation. 
A non-parametiic dcteuiiiiiatioii of tieiid values foi iiidivuliials which may be applied 
to niiiBs data is corisideecd 

17. Statistical Problems in Psychological Testing. Edwaud E. Cureton, Uni¬ 
versity of Tennessee 

Though groat piogress has been made m matheniatical sliitisties in recent years, a numbei 
of the major statistical pioblcnis eiicoiiutered m the development and use of psychological 
tests lemain unsolved tiomc of these prolilciiis are outlined, with paiLiculai reference to 
the mathematical models and assiiniptioiis uutilicd by psychological theory, by the nature 
of the experimental data, and by tlic conililinns umlcr which the lesulls and Imdings are 
to be applied. 


18. Accuracy of a Linear Prediction Equation in a New Sample. George E’ 
Nicholson, Jr., University of North Carolina. 


The problem conaiderod is as follows. Given two samples St and Si of Ni and N’t observa¬ 
tions on a JJ -b 1 ehaiaoter random variable (i/, Xi Xp). Let K, and Tj be the linear regres¬ 
sion equation computed by the method of least square.? from each sample The ofCeot of 

using Fi to predict the y's m Si is considered. The ratio k, ~ s used as a measure 

5 ( 1/8 - Yi) 

of the predicting efRoienoy of Fi in Si relative to Fj when the A, are fixed for the usual 
regression model The general multivariate case is also considered 


19. Independence of Quadratic Forms in Normally Correlated Variables. Yuki- 
Yosi Kawada, Tokyo University of Literature and Science, Tokyo, Japan. 

An extension is given of thooienis of CTaig, Hotelling and Matdni which includes the 
following theorem, proved by a new method If two quadi atic forms Qi , Qa m uoimally and 
independently distiibuted variates with zero means and unit variances satisfy the four 
conditions E{Q\Q\) = E[Q\)E{Q\), for i, j = 1, 2, then the product of the matrices of the 
two forms in cither order iS zero 


20. Bounds on the Distribution of Chi-square. S. A. Vora, University of North 
Carolina. 

Let 

^ k 

= 2 (y, - np,)ynp,, x'“ = S (v, -b | - Np^Y/Np,, 

7-1 

h K 

where a, > 0, S y, = n, p, > o, 2 p, = ] and N = n + k/2. Bounds on the multinomial 
piobabihty T in terras of are obtained A Inangulai tiansformalioii of 
r, == (t'l -b -1 — E'pi)/[Kpt{l — 


(i = 1, • , t - Ij 



A.HhTIlACT5 OF PAPKHS 


309 


to y, 13 applied so that 


/ -1 



whoie d is delermnied later by etiualing the eoelhcieuts of x’’ Certain le.ctaiiRles ? (w) 
with (i/i , • ■ , ill i) A mid-point an" non-overlappiuR and cover the entire space Ri.~i 
for V, — 0 , rtl, 4:11, • . If x’® e, then Ijound.s on T in terina of the integral of the (/c — 1 ) 

dimensional normal freipieney function ovei the, rectangle r (e) arc obtained Prob Ix'^ < cl 
IS the sum of T ovci x’* 5 <’> so the integral ovci tlie sum of rectangles whose mid-points 
lie within the hypersphere x'^ < c is considered. Two hyiicrsphercs, one which contains the 
sum of those rectangles, and one which is contained in it are used for the bounds, giving 

X: I4-i(c!) < Prob |x'“ < cl < Xi-Pt-iCci), 

where Pa,_!(c) is a clii-square distribution function with (fc — 1 ) degrees of freedom and 
Xi, Xu , Cl , C{ are functions of c, n, k and 711 , • , pa . As n —» “, both bounds tend to 

Hounds of the same form arc obtained for Prob (x* < C] Closer bounds 
for Prob.lx' < C| ai'C given in terms of a non-central chi-squaie distribution 

21. Estimation of Genetic Parameters. C. R. PIenderson, Cornell University. 


Many appliontions of genetics and statistics to the improvoment of plants and animals 
deal with expciimental data for which the underlying model is assumed to be 

" "-y 

y„ = 2 j b, X,a -b 2; U, Z,c, + Co , 

1-1 1-1 


where 6, aic unknown fixed parameters, and s,a are observable parameteis, the u, are 
a random sample fiom a multivariate normal distribution with means zero and covariance 
matrix || <r,, ||, and the c„ are normally and independently distributed with means zero 
and variances <r“ . If cr„ = 0 when i 9 ^ ] and if o-* = o-J, the model is the one usually as¬ 
sumed when components of variance arc estimated 

Three different estimation problems are involved, (1) estimation of 6, under the assump¬ 
tions of the model, (2) estimation of and (3) estimation of cr,, . The first two problems 
are not solved satisfactorily by the least squares procedure m which the w, are regarded 
as fixed, but the maximum likelihood solution does lead to a satisfactory estima¬ 
tion procedure. 

Assuming tliat the o-,, and are known, the joint maximum likelihood estimates of 
b. and 71 , are the solution to the set of linear equations 


p 

V 

1-1 


J) <7 

S 1), (S a:,„/(r“) -f- S u, (2 xa„z,„/0 = 

» — l a <* 

b, (IS .'c,aZim/0 -t- 2 H-2 

a »“1 “ 


^ Xhaya/ ®‘a» 

a 


— 2 Z/ia^eefO'at 

a 


h = 1 , • , p. 
h = l, ■ ,q. 


Some important applications of this estimation piocedure to genetic studies aie described 
and certain computational short-cuts are suggested 

The problem of estimating »■,, has not been solved satisfactory although under certain 
quite general assumptions the equations for the joint estimation of b, , ir, , ir,, , and <r„ 
can easily be written The solution to the equations, however, is too difficult to make the 
piocedure piaotical Nevertheless unbiased estimates of o-,, can be obtained by equating 
to their expected values the differences between certain reductions in sums of squares 
computed by least squares and solving for the tr,. In gericial, the expectation of the reduc- 
lion due to bi , , bn , in , , iia{I.' < gl is 2 2 d'"*A'(r»Fr), wheip arc the elements 

) 7 r 7 



310 


ABKTltACTS f)P PAPEIiK 


(if tlic Iiiiitii\ which IS th(' inverse nf the t/» i- h)’ iii;itri\ of coeliicienis uml the are' the 
iisht nieinhers of the lea.sl sciiiares ('([iialions 

22. Estimating the Mean and Standard Deviation of Normal Populations from 
Double Truncated Samples. A. C, doiiioN, Jit., Univoi.sity of Georgia. 

The method of maximum likelihood i.s cinployi'd to olitain esIiniate.H of the uieun niid 
standard deviation of a noiinally di.strilnited iiopulatioii from dmihle truncated lamlom 
saiiiplcs Two cases are ctm.sidered In the fust, the uuuihei of missiiiK variates i.s assuriud 
to be unknown In the second, the nnmher of missing (unineiisiired) variates in each tail i.s 
known Variance.^ for the estimates involved in each case aie olilained from (he miiMnuiiH 
likelihood mfoimatioii raatiice.s. A numerical example i.s given to illu.slra(e the piaetieal 
application of the estimating equations obtained foi eaeh of the two ea.ses eoiisideied 

23. Minimax Estimates of Location and Scale Parameters. Gopinatii Kalli- 
ANPUR, University of North Carolina. 


If the joint fr f of tlie random variahle.s A'l , , A’w cnniaiiis only a .scale p.iiainetcr 

and is of the form 


1 / xi rv\ 

1 

\U ct j 


then undei mild restrielioii.s the following theorem is proved- 

( (X — <t\ 

Tnnoiii'iM 1 // t/ic Ions fitiiclioii as of the form IK 1 -1 > ti 

Soir) nf « minimizis 


Ihi' hcLl or vitniina.i esliimilc 


■ •?) 


11 r (, u and farther, 

aoOuTi > . arK) = (iSoGi . • > ai.v), a > 0 

When both location and scale parameters are present and the joint Ir 1 is of the form 


1 /ji - 0 XV - b\ 

—. V I -, • ,-) . 

a \ cx rx / 


(uiidei coiulitions similar to those in Theorem 1} we nhtaiii two results fur the estimation 
ol 6 and a, lespectively, one of w'hicli i.s 


Theorem 2 If the loss function ts of the form W 




the. hi'sl csLiinalr, 0o(x) of 0 minimizes 


r r u'C— 

J-„ Jo \ a f as- \ a a ! 

, „ /ll + TJV + WdfXl , 

and Oo I —-— , • ■ , — j --- 


, xvl + k 


Those theorems have been applied to dciive rairiim.'ix estimates in the case of .slandaid 
distributions Tmally, the problem of estimating the diffeiencc between tlie location 
parameters of two pojjulations is briefly considered The lesults obtained in this jiajicr are 
a continuation of the line of .aiqircuch .suggested m Theoiein 5 of Wald’s, "C.ontiibutions 



ABSTRACTS OF PAPERS 


311 


to the Theoiy of KtatiHtical Ef,tiniation and Testing Hypotheses ” {Amials of Math Slat , 
Vol, 10 (UKJO), pp 200-22,')) (The present work was earned out under Office of Naval Re¬ 
search contract ) 

24. On Some Features of the Neyman-Pearson and the Wald Theories of Statis¬ 
tical Inference, Their Interrelations and Their Bearing on Some Usual Problems 
of Statistical Inference. S. N. Hoy, University of North Carolina 

With two alternative liy[)Otheses /f, and //. it is shown that (i) the most iiowerful test 
of ffi with K'spect to //i IS automatically an unbiased lest in the sense that its power is 
never Ics.s than (and u.siially greater than) the level of .significance a and (n) there is also 
a least powoifnl test with its powei iiol greater (usually less) than a This means that all 
tests have iiowers lying m between, which gives a complete picture of the possible family 
of tests and provides a basis for deliniiig efficiency of tests. 

With the first kind of eiroi a is tied up a minimum second kind of hiioi p (comple¬ 
mentary to the maximum power P), and the level at which ct is fixed depends upon some 
compromise hetwcoii a and |9. This intuitive approach is formalised by the introduction of 
loss functions related to and aprion probability weights for Hi and Hi , thus leading to 
the first stage in the Wald trcatniont of dichotomy with two solutions m the obseivation 
space corresponding lespcctively to minimum and maximum total risks This is imme¬ 
diately gcnoi alised to the first stage iii the Wald treatment of multichotomy with minimum 
and maximum total risk solutions. An important special case is discussed in which all the 
possible alternatives to a partioulai hypotliesis aie, by our test procedure, indistinguish¬ 
able among themselves, thus effectively foiming only one alternative to the hypothesis, 
which moans a dcgonoiate multichotomy The beaung of this on most powerful tests on 
an average under the Neyinan-Poarson theory is also discussed 

The problem of testing a composite hypothesis which is usually treated in teims of the 
N'eyman-Peanson theory is posed and treated in terms of the (first stage) Wald theory and 
an indication is given of how these notions could be applied to the usual problems of uni¬ 
variate and multivariate analysis 

25 Note on Uniformly Best Unbiased Estimates. R. C Davis, Naval Ordnance 
Test Station, Inyokern, California. 


For the estimation in an absolutely eontmiious probability distribution of an unknown 
parameter which does not possess a sufficient statistic, it is shown that no unbiased esti¬ 
mate for the unknown parameter exists which attains minimum variance uniformly over 
a paranietei set of arbitiary nature This result demonstiates the impossibility of obtain¬ 
ing a generalized sufficient statistic first proposed by Bhattacharyya Although not used 
in this note it is suiimscd that Barankin’s powerful resuUs on locally best unbiased esti¬ 
mates can be applied to yield further results in this diicction. 

26. Competitive Estimation. FIeebbrt Robbins, University of North Carolina. 

Lei 9 be a vector random variable with distribution function G(fl) and let a: be a vector 
random variable whoso frequency funotion/(E; 9) depends on 9 Two statisticians, A andS, 
are required to estimate 9 from the value of x If A’s estimate is closer to 9 be wins one 
dollar from B, and vice versa] in case of a tie no money changes hands It is shown that A 
should estimate 6 by the function a{x) = median of posterior distribution of 9 given a:, 
lus expected gam will then be >0 whatevei estimate B may use. If G(fl) is not known to A 
he should estimate it from the senes of values of 9 winch have been observed in previous 



AnSTinCTK OF X'APERS 


n 

ij 




inalh. If tlicsc aip not kiiciwii, A .sliotild Phliiiuitc 0(0) from tho values of x which have 
pieviously occunod, hou this may be done ib discussed elsowliore (sec Abstract 35) 

^1001 the point of view of the tlieoiy of names, when 0(8) is unknown wc have a game in 
which the “rules” are unknown and must he sui'ccbsivcly estimated from past experience. 
Other ex'amtiles arise whenevei' a game involves lanclom devices whose probability dis¬ 
tributions are, not known to the players but must be, inferred by statistical methods, m 
general from sccondaiy variable.s which eout.ain only part of the total mformatioii The 
role of statistical mfereneo in such "long term” ganies is fundamental 

27 The Efiect of an Unknown ‘Location Disturbance’ on “Student’s” t based 
on a Linear Regression Model. Uttam Chand, Bo.ston University 


Consider yi , , i/ivi , !/Wi+i , • • • ?/« , a set of observations ordered in. time If the 

y's are normally and independently distributed according to N{a -|- |3(i — 1), o-®) and we 
want to find out if the y’o have changed with time, we usually employ a “Student's” t type 
of statistic with N — 2 degrees of freedom. If, as a consequence of the impact of a certain 
unknown political or economic change in the past on the y’s, the y’s actually constitute 
two independent, normal samples yi , j/n, , ?/»i+i , •■ yji distributed according to 
N{mi , ff*), jV(m 2 , ir^) respectively, a two-sample "Student’s” I also based onN — 2 degrees 
of fieedom would be the appropriate statistic to use for the hypothesis mi = mi . If, in 
fact, the latter situation describes the correct stale of aflaiis, and tlio statistician employs 
the “Student’s” I based on the iegression model, he commits an error The present papei 
investigates the nature of suck an on or in the light of the point of impact as determined 
by the magnitude of Ni and the intensity of the impact as dctcrinmcd by the standaidized 


’distance’ 


tions y 



of this c\ti ancons ‘shock’ on the ordcied set of observa- 


28 Corrections for Non-normality for the Two-sample t and the F Distributions 
Valid for High Significance Levels. Ralph A. Bradley, McGill University. 

The ellecta of non-noimahty of the paicnt population on common tests of significance 
have long been of concern in the application of statistical methods to experimental data 
In this paper, the two-sample f-statistic is expressed as a simple multiple of the cotangent 
of an angle between two lines m a space of dimensionality one less than, the total of the 
sample sizes, the T-statistic for k samples is expressed as a multiple of the cotangent of 
an angle between a line and a plane of (fc — 1) dimensions in a space, again, of dimension¬ 
ality one less than the total of the sample sizes. The gcomelncal formulation is such as to 
suggest approximations to the distiibulions of these statistics valid for largo values of 
the statistics, and these approximations aic obtained. The approximations are shown bo be 
exact in the special cases where the paicnt population is normal, and a method of evalua¬ 
tion of correction factors is given for a wide class of patent populations The approximation 
procedures arc valid for the distributions under both null and non-null hypotheses 

29 Some Tests Based on the Empirical Distribution Function. (Preliminary 
Report). James F, Hannan, University of North Carolina, 

Let X = (Xi , Xi , ■ , X„) be an independent sample of m where X, has the continu¬ 

ous cdf. F(i) Let S„{x) be the empirical distribution function Acceptance regions of 



ABSTRACTS OP PAPERS 


313 


the type IjY^SuCt:) < it>(x) for all a] are considered for diffeientBpeoifioations ofi#> and their 
probabilities evaluated The method of evaluation consists in identifying the regions with 
regions defined in terms of the order statistics of a sample of n from the uniform distribu¬ 
tion on the intcival (0, 1) The result obtained for 1 ^( 2 :) = F(x) + e/n, 0 < c, integral <n 
is used to provide a direct proof of the Kolniogoiofif result 

lim P[n’'’ sup — F(x)) < z] = I — e"*'”, 

n-*® a 

while that obtained tor cli(x) = F(x) -h ti 0 t Ij gives the exact c.d f. of the statistic 
sup, (FnU) — F(x)). 

30. On a Generalization of the Behrens-Fisher Problem. (By Title). John E. 
Walsh, Rand Corporation, Santa Monica, California. 

Let m n independent observations be available where it is only known that a specified 
m of them arc from continuous symmetrical populations with common median fi while the 
remaining n are from continuous symmetrical populations with common median v. This is 
the generalization of the Behrens-I’isher problem investigated; some tests and confidence 
intervals for p — v which are valid for the generalized situation are presented For definite¬ 
ness, suppose that n < m The procedure used is to subdivide the m observations (common 
median m) mto n groups of nearly equal size and form the mean of the observations for 
each group. Pair the n means with remaining n observations and subtract the value of 
each observation fiom the value of the mean with which it is paired. The resulting n values 
represent independent observations from populations with common median 11 — v Tests 
and confidence intervals for >1 - v are obtained by applying the lesults of “Applications 
of Some Significance Tests for the Median Which are Valid Under Very General Condi¬ 
tions” {Jour. Amer. Stai. Assn., Vol 44 (1949), pp. 342-66) to these n values To me^uie 
the “information” lost by using the generalized tests when one actually has two inde¬ 
pendent samples from normal populations, power eflaciencies are computed with respect 
to. (a) Scheffd’s “best” 1-test solution and (b) most powerful solution when latio of vari¬ 
ances IS known. Case (a) yields an upper bound while case (b) fuinislics a lower bound 
for the actual efficiency. 

31. Construction of Partially Balanced Designs with two Accuracies. (By Title). 
S. S. Shrikhande, University of North Carolina and Nagpur College, Nagpui, 
India. 

Various methods of construction of partially balanced designs fiist introduced by Bose 
and Nair {Sankhya, Vol 4 (1939), pp. 337-373) have been considered. Two of the methods 
given aie generalisations of a difference theorem given by them. Another method is the 
inveision of an unreduced balanced incomplete block design with k - 2. Use has also been 
made of the existing balanced incomplete block design in anothei diieotion A number ot 
designs can also be obtained by methods of finite geometries and especially by omitting a 
number of treatments and certain blocks from the oomplete lattice designs Use of curves 
and surfaces in finite geometries and the use of multifaotonal designs given y acre 
and Burman {Btomeirrka, Vol. 33 (1946). pp 306-325) are also indicated 

32 Designs for Two-way Elimination of Heterogeneity. (By Title), b. S 
Shrikhande, University of North Carolina and Nagpur College, Nagpur, 
India. 

Use has been made of the existing balanced and some partially balan ced designs for two- 



314 


AURTllAOTS OF PAX’FliH 


waj I'Uimuaiidu of ln’tinoncuyifv willi sit, (wn lu’cuiiu'u-s I’aiticuUu’ c.aRPs of thcso 
ilcsijfns «(’!(' "ivcii liy Voiiiifii (Cimli ihutKnn from Iloya' Tlioinjit^nn hi'ilitulr, Vol 9 (1937), 
pp. 31T-3'2(i) and lioHi' and Ki.slicu {(SVirnfr niid CirUuic (19‘{'1), p[). HCi-ld?) Tho method 
depends upon iiilereliaiiKiiip; the positnms of vaiioiis tieahiiiMils m the dilTeieiit eolumna 
(blocks), if iii'cessiiij, Ml) as to ,satisfy l‘l■Il.‘utl coudiliiUiH. 

33. Designs for Animal Feeding Experiments. (P.y 'Fit,If). S >S. SiiuriaivNDE, 
UniversiLy of North Cnroliiin luul Ntigitnr (hillcge, NtiKpiir, India 

In aninud-feedini; evpei'imenl.s ('hatif'e-()\ei de^inns ;iie (;i‘iieiallv prefeiahli' to eontiuu- 
0U3 feeding espeiluienla In ehangc-ovei desigiia both llie diieet iind cany-ovei tientment 
effects aie imiioilant Uae of lialaneed and partially halaueed incomplete block de.signs 
tiiwaid this end hu.s been corisidcicd 

34. A Truncated Sequential Procedure for Interval Estimation, with Applications 
to the Poisson and Negative Binomial Distributions. (Preliminary Report). 
(By Title). D Martin Sandeuius, Tlnivt'r.sity of Ilppsiilii, Sweden, and Uni- 
veisity of Washington 

Let I, i/i, 1 / 2 , ■ ho aacqueucc. of random vaiiublcs deliucd in (0, ®), and lot n be the 

n4 I 

amallesl iiiLa^for satisfying y, > IXf wheic I > 0 is a noii'raiiclom qiiaiiULy Define Uk 

k fc 

cibhoi (IS yjx or iia tlu) .sinalloat intojici o\ccedinK w y,/x, /t = 1, 2, •• . Given the 

i 

duRtiilnitiou runelion F{x., 0) of t and, for any t., the, ooiiditional diHtiihulion of n with 
respoot to x, the di.sLriliulion of la ohUiucd Tliii problem i.s to dcleiimue a coahtloiioe 
interval tor 0 rvilh conlidenc.e coelhcumt 1 — a on tho basis of either an observation on 
lit, if lit < I, 01 an ob.sei'vation on n, if n < A: — 1 The folhnvuig procedure i.s pioposed. 

If in, < t, choose flm and flu acoouhiiK to a rule satisfying Proh (On, < 0 ^ flu) 1 rn < t) > 
I—a IfTi<fc — 1, choo.so 0JO and flai such that Prob (fljo<0< fljiln<A; ~ 1) > 1 — a 
Foi continuous nj, the following ca.ses are discussed, a) a; = fl with piobability 1, and n 
has, for any t, a Poisson duslnlmtion with menu 10, b) x has a Gainina distribution with 
mean 0, and the conditional distribution of n with icspect to x is, for any I, a Poi.sson dis- 
tiihulion Both cases may, for instance, bo applied to hiicteiial counting. 

35. A Generalization of the Method of Maximum Likelihood: Estimating a 
Mixing Distribution. (Preliminary Report). (By Title). Herbert Robbins, Uni¬ 
versity of North Carolina 

I,cl 0 be a vector landoin variable with distribution function (7(fl) belonging to some 
clais 6 ), let T be a vector iiuidoiu vauablo wbo.se frciiucncy fiiuctinii /(x; fl) depends on fl, 
and let i/hc) = jf(x, fl) citrffl) be the. resulting fiequeney fnuetion of x Prom a sample 
11 , To , it i.s leqiiiied to estimate G{0) The geueraHr.ed luethod of maximum likeli¬ 
hood coii.sibls in using the estimates (7„(fl; ti , , x„) in tt) for which 11 (/"(r,) is a nia\i- 

iiuiiii Under oeitain lestiietions this melhoil i.s eorususlent as n ■/;. 

Any consistent method of estimating the mixing distiilmtuin 0(0) fioiii the .sequence 
xi , i 2 , • ■ yicld.s a solution of paiametric statistical decision problems in the following 
mannei from pasL value.s xi , ■ , x„_i we estimate 0(0), and then use the coiresponding 
Bayes solution of the decision problem to reaeli our deension for x„ , even though the value 
9,1 which produced x„ is different from those which pi oducod xi , • , Xn-i In certain oases 

tiC long-tciin experimentation this approach seems moie reasonable than the minimax 
iiu'lliod which decide,s on the course of action appropriate to 0,. on the basis of x„ only. 



NEWS ANP 


ami iRnorct t.lip 

iri Ti , j TJn_1 • 


information about tli<‘ 1"^'”'^ 


ilittfil'P* 


lirli 




f k 


.. 


3fi. Smallest Average Coafidence Sets lor ,,j Uiiiv<'i»ty f'if ^ 

Normal Means. (By Title). Ragiiu IIA. .i..,,. ,1 ‘"'1 


.. 


\ 


Let ti = (tii , , xini ; ■ • , Xi, , ' 7.Jnuila'''""' ' 

s.amplps of sizes ni , ni , , nj, from noriiift' I 

IT, havinp; mean ji. and variance 0 -; '^Hy ji.'irftP'elCt' . 


srj 


JCuclifIpan fipaco of all points ju by R- 


= (<ri 


<r/,). and any sot-valiiod f'”’f ,1 / « 

fwhic.h sallfi'"^’’ , fniHC. ii'id fb' > 

/(») 


(hr i 
A 

,,..51. 




IjM" 


having subsets of R as ita values (which * 


-4'^' 


( Sr'l? 


Ei^i bf-'iili 

«(/ I Mj c) = probability of the statement 'V ponslructiut;/** ^ *i 

LobrsKuo measure of /(y). We considci ^ytiKs olitait'**'* n 

O' and /3 "as small as possible” One ^ 




ir*’ 


{■(;> 

V, 0 < p < 1, Ict/Jft^w = .. 

f. = iir'Sp' rr., , .< = nr^El" (*„ - ^ f 'J?”’ V - -S n, I. Then 

constants., and f(p) being dotorminod by r(xi'^(,f [iccilnm ‘ m,) js any «d!lt>r 

pendent ehi-squarc variables with k, N — ' q < c < “ > ' o (y) dtfTor by 

(a) ohviouslYmf/!; f(p) I (I, cX) = p forall/iai'fl* (,ithcr (») /(«) irf(/v.r(p, U. 

function such that «(/ | p, cX) < p for all i» ah*! ft ^ | cy)) > suP 

a sot of measure zero for almost every a, or (d) 

for every a. 


NEWS AND 


jjOTiCES 


item 


gentral irUonst 


r Ike 

Readers are invited l(> submit to the Secrelarti oj 

Personal IteiPS Special 

, f ihe Gnncrvispi' of ilie 

Mr Harry 11. Goode, formerly head of ^ _york, is noW b P jyjjchigan, 

Device Center, Office of Naval Research 1, (^^gnter, HnDersi y 
Aero-Physics Group, Aeronautical Ecseard jolms Hopkins 

Ann Arbor, Michigan ..clyemploy'^'^ by Mathe- 

Mi. William G. Howard, who was previo piesently o^P QongfosS' 
University, Institute for Cooperative Researo^^ Library ° tistioio.n in the 

matical Riatistieian m the Air Studies o, position as ^ Q^a. Division 

Mi.ss Margaret Kampschaefer has acoop ® ^ project, jyjaihe- 

H. S. Burea.u of Labor Statistics, employ®^ division, Ghi' 

of ICmployment and Security She was Naval Roac o 

matician at the Argonne National Labora o 
cago, Illinois. 


... Prol®*’ “ 

Dr Albert Noack has recently been apP" -hability 

matics at the University of Koeln, Germany- e+atistics an^ 

Abpmatical ^.r-al baboraturv, 

Second Berkeley Symposium on Math gtatisb 

^•11 31 to Augnat 12, 


The vSecond Berkeley Symposium 
riuvers'ily of California, Berkeley, from 


July 



UG 


NEWS AND NOTICES 


jooporatioii of the Amoricau HLaLisLical Association (Biometrics Seciion), the 
Biomctiic iSocicty (Western North American Region), the Econometric Society, 
Jic Institute of Mathematical Statistics, the Institute of Transportation and 
ITa/hG Engineering (UC), and tlie Office of Naval Research. 

The Symposium will include .sehsi()n.s on mathematical stati.stic.s, probability, 
biometrics, econometrics, traffic eiigincin-ing, a.stronomy, and physic.s. The com¬ 
plete program may be obtained from the Stafistical Laboratory. The papei.s will 
be publifshed by the University of California Pres.s as the Proceedings of The 
Second Symjwsimn. 

Cumulative Index of Volumes 1-20 

Attention is called to the fact that there is now available a cumulative index 
for Volumes 1 through 20 (1930-1949) of the Annals of Mathcmahcal Staiistics. 
Copies may be secured from the office of the Secretary-Treasiuer for $1 00 per 
copy 

I New Members 

The following persons have been elected to membership in the Institute 
(Dcceinl)cr 1, 1949 to February 28, 1950) 

Bain, John C., B A. (Univ of Toronto), President’s Statistician, Abitibi Power 
& Paper Company, Ltd., 408 University Avenue, Toronto 2, Ontario, 
Canada, 

Blakemore, George J., Jr., A.B. (George Wash Univ.), Student at George Wash¬ 
ington University, 17^8 Hobait St., ffW, Washington 10, D C. 

Bross, Irwin, D. J., Ph.D. (North Carolina State College), Research Associate, 
Department of Biostatistlcs, School of Public Health, The Johns Hopkins 
University, G15 North Wolfe Street, Baltimore 5, Maryland. 

Cansado Maceda, Enrique, Ph.D. (University of Madrid), Assistant Profes.sor 
of Mathematical Statistics, Faculty of Sciences, University of Madrid and 
Official of the National Institute of Statistics, Pasoo do Rosales, 50 Madrid, 
Spam. 

Clatworthy, Willard H,, M A (Univ of Kentucky), Student at the University 
of North Carolina, Box 1G8, Chapel Hill, North Carolina 
Dinsmore, Robert J., A B (Univ. of Calif.), Student at the University of Cali¬ 
fornia, Berkeley, California, 84^8 Milvia St., Berkeleij 4, California. 

Enell, John W., Eng, Sc,D (New York Univ.), As,4istant Professor of Adminis¬ 
trative l^ngineering. New York University, 71 Ayers Court, West Englewood, 
New Jersey. 

Flores, Anna M., M.Sc (Univ. of Mexico), Mathematician, Torres Adalid 
a 511, Mexico City. 

Gamer, Norman R., B.A (Univ of Rochester), Graduate Student at Univer¬ 
sity of North Carolina, 15 Goldslon Ave , Carrhoro, North Carolina 
Hannan, James F., M.A (Harvard), Research Assistant, Department of Mathe¬ 
matical Statistics, University of North Caiolma, P 0 Box 168, Chapel 
Hill, North Carolina. 



REPORT OR CHAPJ3L HILL MEETING 


317 


Klein, Joseph, B.S. (Rutgers), Graduate Student at Rutgers University, P 0 
Box 501, Red Bank, New Jersey. 

Lewis, Evan J., Ph.D. (Cornell Univ), Physicist, Corning Glass Works, Corning, 
New York 

Palekar, Madhukar N., B.S. (Bombay), Graduate Student in Department of 
Mathematical Statistics and Departmental Assistant, 108 Furnald Hall, 
Columbia University, New York 27, New York. 

Page, Woodrow W., M.A. (Oklahoma Univ.), Graduate Student, University of 
North Carolina, 34-1 Jackson Circle, Chapel Hill, North Carolina. 

Pretorius, S. J., Ph.D (Univ. of London), Professor of Statistics, University of 
Stellenbosch, Soeteweide, Stellenbosch, Union of South Africa 

Price, Don C., M.A. (Kent State Univ), Student, Department of Mathematical 
Statistics, University of North Carolina, 1621 Shorb Ave ,NW, Canton 3, 
Ohio 

Scalora, Frank S., A.B. (Harvard), Assistant in Mathematics, lOG Mathematics 
Building, University of Illinois, Uibana, Illinois 

Somerville, Paul N., B Sc. (Alberta, Canada), Graduate Student in Department 
of Mathematical Statistics, University of Noith Caiolina, 316-B Dormitory, 
Chapel Ihll, No>ih Carolina, 

Sirken, Monroe G., M A. (Univ of Calif at L. A), Research Associate, Labora¬ 
tory of Statistical Research, Department of Mathematics, University of 
Washington, Seattle, Washington 

Steam, Joseph L., M S. (College of N. Y), Mathematician, U S. Coast & 
Geodetic Survey, Department of Commerce, Washington, D. C. 

Whelan, Walter J., M A. (Boston Univ), Student, Department of Mathematical 
Statistics, Columbia Univeisity, New York, 119 Wilmington Ave., Dor¬ 
chester 24, Massachusetts 

Wile, Janet L., A.B (Univ of Rochester), Statistician, Department of Defense, 
Army and Transpoitation Corps, ?J!156, 1813 Queens Lane, Arlington, 
Virginia. 

Wilhelmsen, Lars, Aktiiarkandidat (Oslo Univ.), Actuary, Storebrand, Boks 
425, Oslo, Norway. 


REPORT OF THE CHAPEL HILL MEETING OF THE INSTITUTE 

The forty-second meeting of the Institute of Mathematical Statistics wa.s 
held jointly with the Biometric Society (Eastern North American Region) at 
the Chapel Hill campus of the University of North Carolina on Friday, March 
17, and Saturday, March 18, 1950 One hundred twenty-one persons legisteied, 
including the following members of the Institute ■ 

R L Anderson, T W Anderson, Geoffrey Beall, C A Bennett, Mis C A. Bennett, Nils 
Bloniqvist, R C Bose, R A Bradley, Irwin Bross, Glen Burrows, L D Calvin, Ijttfini 
Chand, W. G Cochran, A C Cohen, Jr , W S Connor, Jr , P. P Crump, E E Cureton, 
R C Davis, W L Decmer, T G Donnelly, Churchill Eisenhart, J W. Fertig, S G. 
Ghurye, Leon Gilford, B. G Greenberg,? E Grubbs, Max Halperin, J F Hannan, Boyd 
Harshbarger, C R Henderson, Wassily Hocffdmg, Harold Hotelling, A S. Householder, 



318 


IHQI'OET OF CHAPEL ItILL MEETING 


W G. Howard, y L Ihaac.sou, A. AV. KiidIjhH, Jr , H F Kimljall, MaiKiu'iile Lohr, (iiiido 
Liserre, Eiigono Lulcacs, Cl L, Marks, H. A Meyei, Ikiul Miiuoii, 1) J Monow, Jack 
Moshmaa, G M. MoUey, M. L, Norden, II. W NorLoii, Ingram Olkiii, Paul Peaeli, J A 
TlaHerty, Wyman Ilicliardson, Jr , Herbert Robhiiis, H N Hoy, 8 A Schmitt, H 10. Ser- 
fling, H. H Shepard,? N Summerville, E. W Stacy, J \V Tukcy, U P Yotaw, Jr , 
?, M, Wadley, M. A. Woodbury, Marvin Zelmi. 

Profeasor R. L. Andersoa presided at, the opening .session for eontnlniU'd 
papeits on Priday morning. The following papens were pre.senLed 

1 A Method of Eslinialinq the Para/nelers of an Atiloregrrs'iii'p Tune S'enrs'. Mr S G 
Ghurye, University of NoiLh Carolina. 

2 Most Poiuerful Rank Order Tests Piofea.soi Wassily Iloelfdiiig, University of North 
Carolina, 

3. The ComjiarisOn of Percentages in Matched Samples, Profes.soi W G. Cochran, .Johns 
Hqpkms Univoi.siLy 

4, A Method of Estimating Components of Variance in Dispiopurlionnte Numbers Pro¬ 
fessor H L Lucas, Noilh Carolina Stale College 

5 On the Theory of Unbiased Tests of Simple Statistical Ifypothesrs Specifying the Values 
of Two Parameters. M.r S L. Isaacson, Columbia Umvei.sity 
0 A Note onOrthogonal Arrays Professor It C Bose, UmvuisiLy of North Caioliiia. 

7 Transformations Related to the Angular and the Square Root Mr. M F. Freeman 
and Professor J W Tulcey, Pinicetoii University 
,S Standard Inverse Matrices for Pitting Polynomials. Mr F .1. Vorhnden, Nortli Caio- 
hna Stale College 

On Friday afternoon Dr, James A. Rafferty, School of Aviation Medicine, 
Randolph Field, Texa.S| gave an invited addre.s.s on Malhcmalical Models in 
Biology. Profe.ssor Gertrude M. Cox then pre.sided at a ses.sion foi contributed 
papers, at which the following papers were piesented: 

1 Small Sample Paiformance of Biological Stalislia. Mr Irwin Bruss, .lulins Hopkins 
University 

2 Methodology in the Study of Physical Measurements of School Children. Piidessor 
B G Greenberg and Piofossor A. II. Bryan, Uiiivei.sity of North Carolina 

3 Tetrad Analysis in Yeast Dr A S Householder, Oak Ridge National Laboiatory 
1 Contribution to the Probabilistic Theory of Neural Nets I Randomization of Refiac- 

tory Periods and of Stimulus Intervals. Profcssoi Anatol Rapopoit, University of 
Chicago 

5, Theoretical and Expeiimental Aspects in the Removal of Airborne Mailer by the Human 
Respiratory Tract Profc.ssoi H D. Landahl, University of Ghieago, (Reail by Pro- 
fessoi Rapopovt.) 

0 An Application of Biomeliics to Zoological Classifications Di F M Wadley, Navy 
Department, Waalungtoii, D, C 

7 The Analysis of Ilemotological Effects of Chronic Low-level Radiation Mr. Jack Moah- 
man, United States Atomic Enoigy Commission, Oak Ridge, Tennessee 

A joint dinner of the two sponsoring organizations was held at the Carolina 
Inn on Friday evening, with an attendance of sixty-two Professor W G. Cochran 
as toastmaster introduced Chancellor R B Plouse of the University of North 
Carolina who welcomed the gathering with words and music. Professor Gertrude 
M. Cox responded for the Biometric Society and Professor D F Votaw for the 
Institute 

Professor Harold Hotelling presided at a Saturday morning symposium on 



HEPORT OP CHAPEL HILL MEETING 


319 


multivariate analysis Professor E E. Cureton of the University of Tennessee 
gave the opening address on StaUstical Problems in Psychological Testing. After 
a lively discussion the following contributed papers were presented: 

], Accuracy of a Linear Prediction Equation in a New Sample Professor George E 
Nicholson, Jr , University of North Carolina 
2 Independence of Quadratic Forms in Normally Correlated Variables Professor Yuki- 
yosi Kawada, Tokyo University of Literature and Science, Tokyo, Japan (Read 
by the chairman) 

3. Bounds on the Distribution of Chi-square, Mr. S. A. Vora, University of North 
Carolina. 

This was followed by a Biometric Society address by Professor C R. Henderson 
of Cornell University on Estimation of Genetic Parameters. 

Professor W. G. Cochran presided at the final session for contributed papers 
on Saturday afternoon The following papers were presented: 

1 Estimating the l\Iean and Standard Deviation of Normal Populations from Double 
Truncated Samples Professor A. C Cohen, Jr , Umvcrsity of Georgia 

2 Minimax Estimates of Location and Scale Parameters. Mi. Gopmath Kallianpui, 
University of North Caiohna. 

3 On Some Features of the Neyman-Pearson and Wald Theories of Siaiistieal Inference, 
Their Interrelations and Bearing on Some Usual Problems of Statistical Inference. 
Professor S. N Roy, University of North Carolina 

4. Note on Uniformly Best Unbiased Estimates Mr. R C Davis, Naval Ordnance Test 
Station, Inyokern, Calif 

5. Competiiwc Estimation Professor Herbert Robbins, Umversity of North Carolina 
0. 7'he Effect of an Unknown ‘Location Disturbance' on “Student’s" t Based on a Linear 

Regression Model Professor Uttam Chand, Boston Umversity 

7 Corrections for Non-normalily for the Two-sample i and F distributions Valid for 
High Significance Levels Piofessor Ralph A Biadley, McGill University 

8 Some Tests Based on the Empirical Dislribulion Function Mr J P Hannan, Uni¬ 
versity of North Carolina. 

0. On a Generalization of the Behrcns-Fisher Problem. (By title). Dr John E Walsh, 
Rand Corporation, Santa Monica, Calif 

10 Construction of Partially Balanced Designs with Two Accuracies. (By title) Mr. 
S. S Shrikhande, University of North Carolina and Nagpur College, Nagpur, India 

11 Designs for Two-way Elimination of Heterogeneity. (By title). Mr. S. S. Shrikhande 

12 Designs for Annual Feeding Experiments (By title) Mr S. S Shrikhande 

13. A Truncated Sequential Procedure for Interval Estimation, with Applications to the 
Poisson and Negative Binomial Dislributions (By title) Mr D. Martin Sandelius, 
Umversity of Washington and Uppsala University, Uppsala, Sweden 
14 A Generalization of the Method of Maximum Likelihood Estimating a Mixing Dis¬ 
tribution (By Iille) Professor Herbert Robbins, Univeisity of Noith Carolina 
15. Smallest Average Confidence Sets for the Simultaneous Estimation of k Normal 
.Means. (By title) Mi Ragliu Raj Bahadur, Univeisity of Noith Carolina 

t 

Aliout eighty-five member,s of the two organizations attended a tea given by 
Professor and Mrs. Hotelling at the conclusion of the Saturday afternoon 
session. 


Herbert Robbins 

Assistant Secretary 




FUNDAMENTAL LIMIT THEOREMS OF PROBABILITY THEORYi 

By M. LoijvE® 

University of California, Berkeley 

no sooner is Proteus caught 
than he changes his shape 

1. Introduction., The fundamental limit theorems of Probability theory may 
be classified into two groups. One group deals with the problem of limit laws 
of sequences of sums of random variables, the other deals with the problem of 
limits of random variables, in the sense of almost sure convergence, of such 
sequences. These problems will be labelled, respectively, the Central Limit Prob¬ 
lem (CLP) and the Strong Central Limit Problem (SCLP). T;ike all mathemati¬ 
cal problems, the CLP and SCLP are not static, as answers to old queries are 
discovered they experience the usual development and new problems arise The 
development consists in (i) simplifying proofs and forging general tools out of 
the special ones (li) sharpening and strengthening results (iii) finding general 
notions behind the results obtained and extending their domains of validity. 
Analysis of this growth will put in relief the role and the interconnections of the 
fundamental limit theorems. 

Summary. The growth of the CLP for independent summands can be divided 
into three (overlapping) periods The first covers the Bernoulli case and the 
corresponding limit theorems of Bernoulli, de Moivre and Poisson. The first two 
theorems gave rise to the notions—from which the classical CLP stems—of 
the Law of Large Numbers (LLN) and of Normal Convergence (NC). Poisson’s 
approach belongs to the set-up of the modern CLP. 

The second period extends over two centuries and is devoted to the extension 
of the domains of validity of LLN and NC. This is the classical CLP period. 
Lyapunov’s crucial work, submitted to the above treatment, led to the discovery 
of the natural boundaries of these domains by Lindeberg, Kolmogorov, Feller 
and P. L6vy. 

However, the LLN and NC problems are but two particular cases of the 
general problem of limit laws of sequences of sums of independent random 
variables. The coming into sight and the solution of this problem—the third 
period of the CLP—covers less than ten years. The tools forged for the classical 
CLP proved to be powerful enough and the final solution is due to P. L4vy, 
Khmtchine, Gnedenko and Doeblin. 

' This paper was presented to the New York meeting of the Institute of Mathematical 
Statistics on December 27, 1949. 

Editor's Note The Institute of Mathematical Statistics has formed a Committee on 
Special Invited Papers to invite lecturers to deliver expository addresses to the Institute 
with the understanding that the Special Invited'Papers are to be published in the Annals 
of Mathematical Statistics This paper is the first one invited by the Committee. 

* This work is supported in part by the Office of Naval Research 

321 



322 


M. LOilVE 


The CLP for dependent variables started with so called Markoff chains. 
The study of their limit properties is due essentially to Markov, S Bernstein 
and Doeblin. For more general forms of dependence the LIjN and NC problems 
were investigated by P. L6vy and Lofeve after the crucial work of S. Bernstein 
The modern CLP was considered only recently (Lofeve). 

The SCLP stems from the strengthening by Borel of the Bernoulli theorem 
and the sharpening of Borel’s result by Khintchine. They gave rise to the no¬ 
tions of Strong Law of Large Numbers (SLLN) and of the Law of the Iterated 
Logarithm (LIT).“ The domains of validity were extended to their boundaries 
by Kolmogorov, P L6vy and Feller. In the case of dependence, results are due 
to G. D Birkhoff, P, L4vy, W Doebhn, and Lofeve However, the SCLP has 
not attained, at piesent, the harmonious development of the CLP. 

Notations. Let £{X) be the law of a (real) random variable (r v.) X The law 
IS defined by the distribution function (d.f) F{x) = P{X < x). As is well known 
.£(X) is determined by the characteristic function (ch. f.) 

fiu) = j e'“* dF(a:), — oo < u < -f-oo. 


When a r.v possesses subscripts, the same subscripts will be used for its d.f. 
and ch f EX will denote the expectation of X: 



X dF{x), 


and <r\X) will denote the variance of X: 

<r\X) = E(,X - EX)\ 


With a random event A we associate a r.v., to be called indicator of the event A, 
which takes values 1 and 0 respectively, according as A occurs or does not occur. 
If X is the indicator of an event A of probability p, then EX = p and aiX) = pq, 
where g = 1 — p. To avoid tnvialities we shall assume that pq 0. 

Two laws £(Xi) and will be said to belong to the same complete type 

if there exist two numbers a ^ 0 and b such that P{Xi ^ a:} = P (aXt + 6 g a;} 
If values of a are restricted to positive values, then the two laws are said to 
belong to the same type. If two independent r.v.’s obey £ and their sum belongs 
to the type of £, then £ and its type are said to be stable. Three classes of laws 
play an essential role in the CLP' the normal and the degenerate types and 
the Poisson complete types. 

9l(m, cr) is a normal law if it is defined by 

r dt (<7 > 0). 

o-V 2ir 

> For a. very thorough and deep analysis of the NC and LIT problems and their solutions 
see Fbllbb, Bull. Am. Math. Soc , Vol. 51 (1945), pp. 800-832, under the same title as that 
of the present paper. 



LIMIT THEOREMS 


323 


£(m) ]S a law degenerate at m, if it attaches probability 1 to the value m. 
91(X; a, b) is a Poisson law if 

P(X = a/c + &) = e"-’' (X > 0), fc = 0, 1, 2, • ■ ■ ; 

the familiar Poisson law is 9^X; 1, 0). 

A law £{X„) is said to converge to the law £.{X) as n —> oo, if F„{x) con¬ 
verges to Fix) at the continuity points of the latter. In this paper’ all limits will 
be considered for n —> m , if not otherwise stated. 

The structure of sequences of r.v.’s whose limit properties are investigated 
will be called the limihng ^process of the problem. The limiting process of sequences 
of sums is that of sequences of the form S„,,, = X„,!,, where -> ® .The 

limiting process of normed sums is that of sequences of the form — — with 
Sn = S"-i Xk , where a„ > 0 and ?>„ are real numbers. Normed sums are a special 
form of sequences of sums: take = n, X„,i = —-, then ;Sn,,„ = — — 6„ . 

Ufi 71 

To avoid repetitions we shall note, once and for all, that limit types rather than 
limit laws appear in the case of normed sums, because, if £(X) is their limit law, 
then any law of its type is obtainable as a limit law by a convenient change of 
origin and of scale a„ , independent of n. The importance of the notion of 
type is due, primarily, to this property. In fact, even more is true: if £(X„) 
converges to £{X) and ie(a„X„ + 6„) converges to £(7), then £(X) and £(F) 
belong to the same type, provided neither is degenerate (Khintchine [20]) 

I. Central Limit Problem 

2. Origin of the CLP; Binomial case. Three limit theorems are at the origin 
of the CLP; the first, due to Bernoulli ([2], 1713), laid the ground. Let Sn be 
the number of occurrences of an event A of probability p in n identical and inde¬ 
pendent trials. Then, for every e > 0, 

Bernoulli found this result by a direct, but cumbersome, analysis of the be- 
havious of the binomial probabilities 

P{iS„ = fc] = CVs”"*, fc = 0, 1, • • • , n. 

* 

Sharpening this analysis, de Moivre ([7], 1730) obtained the second limit theorem 
of probability theory which, in the form given to it by Laplace, states that; 
For every x 




324 


M. LOfeVE 


Suppose now, with Poisson ([3G], 1837), that the probability p = pn depends 

upon the number n of trials and, more precisely, that Pn = - , where X is a posi- 

n 

tive constant. Write then Sn.n , instead of /S „, for the number of occurrences 
of the considered event in a group of n trials. By a direct analysis of the binomial 
probabilities, much easier to carry out than the preceding ones, it follows that 
for /c = 0, 1, ■ ■ • , 

P(5„.n = 


Let Xh be the indicator of the event A in the /c-th trial. The number of occur¬ 
rences Sn is the sum 22"-i of n. of these independent and identically distrib¬ 
uted indicators. The first two limit theorems mean that 

£ ^ £(0) and £ 


Thus we have two limiting processes, (both special and completely specified 
forms of normed sums), and two limit laws (more precisely two limit types, see 
introduction), a degenerate and a normal one 
Poisson’s limiting process is utterly different. S„,n is still a sum A“i-X’n,A 
of independent and identically distributed indicators but, as n varies, all Xn,k 

change, P(Xn,k = 1) = - and 
n 


•£(»Sn.n) 3 ’(\, 1, 0). 


While the two first theorems with their special limiting processes and limit 
laws played a central role m the development of Probability theory, Poisson’s 
result stood isolated and ignored until about fifteen years ago * We shall see 
further that there was a deep reason for its isolation and also that, surprisingly 
enough, Poisson laws are, in a sense, more fundamental for the CLP, than the 
normal law 

3. The classical CLP and its extension. From the time of Laplace until 1935, 
research m the domain of limit laws was centered about the extension to sum¬ 
mands other than indicators of the validity of the two first limit theorems 
This is the period of the classical CLP: Let Sn = Xk be sums of independent 

r.v.’s. Find necessary and sufficient conditions for the LLN and for NC, i e., con¬ 
ditions under which, respectively. 



* In Uspenaky’a textbook (1937!) Poiaaon’a law is mentioned once—in an exercise. 



LIMIT THEOHEMS 


325 


It is assumed that EXk’s and EXl’s exist. The d.f not bemg completely speci¬ 
fied as in the Bernoulli case, the direct Bernoulli-de Moivre approach is of no 
avail and general methods are necessary. The first to appear was the method of 
moments relative to bounds of d f. in terms of their moments (Tchebicheff [40], 
Markov [37]). The relation 


P 


S,.-ES„\ 1 ASn) 

n ^ j - eV ’ 


e > 0, 


together with 

ASn) = Z AXk), 

Jfc-1 

entails at once a LLN theorem (Tchebicheff-Markov): If 

4 1 0, 
k~i 

then the LLN holds. 

This result can be easily improved (bringing it into closer analogy with 
Lyapunov’s theorem): If there exists a constant 5 > 0 such that 

^±ElX,-EX,r.^0 

then the LLN holds. 

It contains then a Markov’s LLN condition; LLN holds liE \Xh — EXk f'*’* g 
C where C is independent of k. 

In a much more elaborate form the method of moments gives also a NC the¬ 
orem (Tchebicheff-Markov): If EY'li —> EZ'‘ for k = 1, 2, ■ ■ • , and S.(Z) = 
01(0, 1), then £(rn) 01(0, 1) 

T his theorem has been extended to more general limit laws. However the 
inherent defects of the method of moments remain. Even if moments of all 
orders exist, they do not necessarily determine a unique d f. A definitive result 
in this direction is the Fr^chct-Shohat theorem -.IfEY^ —> rri'^^ for all k, there exists 
a subsequence £(Fn,) which converges to a hrrnt law £ with moments More¬ 
over, if the moment problem ts determined, i.e., if the m}^^ determine a unique law, 
then the whole sequence £(F„) converges to £. 

To apply the convergence theorem to the NC part of the classical CLP, 
one has to assume existence of moments of all orders. In particular, it does not 
seem suitable for proving Lyapunov’s theorem. Yet, the simple truncation idea 
(Markov) not only overcomes this seemingly insurmountable obstacle, but also 
provides a method per se. It associates with the summands Xj, “truncated” 
r V’s Xi ; for fc ^ n and c„ conveniently chosen real numbers, 

X* = X* if I Xfc I ^ c„ , 

X* = 0 if I X*, I > c„ . 



326 


M. LOfevE 


Nevertheless, the method of moments is too cumbersome and was soon to be 
discarded in favor of that of ch.f.’s. 

The turning point for the entire CLP is Lyapunov’s introduction of the 
method of ch.f.’s. The ch.f.’s were well known and used already by Laplace. 
However, the first convergence property, proved but not stated, is due to 
Lyapunov [28]: If the ch.f.’s g„{u) of S.(Yn) converge to the chf. of 9l(0, 1), 
then S.{Yn) —> 91(0,1). From it he deduced the first general NC theorem [28, 29): 
If there exists a number S > 0 such that 

tr {an) k-i 


then NC holds. 

The ch.f. became, in the hands of P. L6vy [21], a general tool, instrumental 
in the subsequent tremendous growth of the CLP, with the so called 

Continuity Theorem. If the ch.f.’s gn{u) converge to a function g(u) con¬ 
tinuous of u = 0, then £iYn) converge to a limit law £. and g{u) is its ch and 
comer sely. 

The methods of ch f and of truncation dominate at present the limit prob¬ 
lems of Probability theory. 

In spite of the generality of the above conditions for LLN and NC, they are 
not necessary conditions. In fact they are not sharp enough since they assume 
the existence of moments of higher order than those which figure in the classical 
CLP. However the tools forged proved powerful enough to get its complete 
solution. The truncation method yielded to Kolmogorov ([16, 1928) the com¬ 
plete answer to the LLN problem. A “smoothing” device, due to Lyapunov, 
provided Lindeberg ([20], 1922) with adequately sharp sufficient conditions, 
using ch.f.’s P. L6vy ([22], 1922) proved Lindeberg’s result and Feller ([11], 1935) 
showed that, under a natural restriction, these conditions are also necessary. 

Solution of the classical CLP. 

1. LLN holds if, and only if, 

E f dFkix -t-M -*0 and E - f x' dFkix + EXk) 0 

I—1 J|®l>n k—l n^ *l|zl<n 

for r = 1,2. 


2 . 


NC holds and max 

kg It 


e{Xj) 

<r(S«) 


0 if, and only if, for every e > 0, 


S 0'*(Sn) i|*|> 




X* dFk(x -f EXk) ->■ 0. 


An unsatisfactory feature of the classical CLP is the assumption, made at 
,the start, of existence of certain moments. They are used to avoid, as n —> ®, 
the shift, towards infinite values, of the probability spread by changing the 
origin and the scale of values of S„. However there is no specific reason for 
these special choices of norming quantities o™ and b„ except that, historically. 



LIMIT THEOREMS 


327 


they appeared as a straightforward extension of Bernoulli and de Moivre ones. 
Moreovei, even if these moments do not exist, there is no reason not to try 
to fmd norming quantities. (Take Xk’s to be independent and identically dis¬ 
tributed a,s follows: where m = 1, 2, .. ^ 


attach probabilities 


2 2 
TT m 


The second moments are infinite, yet norming Sn by c\/n log n, we have NO.) 
Thus the CLP becomes the problem of the LLN and NC for general normed 
-Sn , 

sums - — On ■ 

djl 

The extended classical NC problem was solved, masterfully and independently, 
by Feller ([10], 1935) using ch f.’s and by P. Levy ([25], 1935) who applied the 
method of truncation The extension of the results to the more general set-up 
of the following section is trivial and will be given there. Feller also solved 
([11], 1937) the extended LLN problem. 

In this new set-up a question arises at once. Given ihe rv’s Xk, do there 
exist numbers which will produce ihe desired convergence? If so, how can they be 
found? This problem is perhaps more difficult than the previous one and is 
specifically linked with the limiting process of normed sums We shall give 
here, a criterion, due to Feller ([10], 1935), which solves entirely the NC prob¬ 
lem.® Take as origin of values of the summands their medians and let c„(e) be 
the g.l b. of the x’a for which 1 > a:) g e. Then normng quantities 

9l(0, 1) and max p/ — ——|>«"|>-+0 

fc g n dn ^ 

exist if, and only if, for every « > 0, 


On and bn such that £ 


lUl / , k 

- ®-) 


die) 


L 




X* dFkix) —> CO. 


4. Modem CLP. At the same time that the classical CLP neared its happy 
end, a new and much wider problem of limit laws appeared and, because the 
necessary tools were at hand, was solved almost at once. Various particular 
problems, of which the classical CLP is one, contributed to its set-up 
Since the discovery, in the Bernoulli case, of the LLN and NC, the problem 
of limit laws has been centered about extensions of their domains of validity 
for more and more general normed sums. A similar query about the Poisson 
convergence would have provided us with a new problem. As soon as we drop 
the restriction that in Sn,vn — £fc-i Xn.k the r.v.’s Xn.k are indicators, we are 
led to the problem of finding conditions under which laws of sums of inde¬ 
pendent r.v.’s will converge to a Poisson law. We have here not only a different 
limit law than in the CLP but also a more general limiting process. An utterly 
different problem, stated and solved by P. L5vy [21], is the following: find the 

'As for the LLN, norming numbers, such that the LLN holds always exist whatever 
be the T V’s X* Hence, from the point of view of limit types of normed sums, the degen¬ 
erate type IB to be considered as a degenerate form of every limit type. 



328 


M LOilVB 


possible limit laws of normed sums of vndependcnl and identically distributed r v.’s 
(the answer is that they are the stable laws). For the first time one does not inquire 
about a completely specified limit law but about the class of all limit laws for 
a fairly general limiting process, Thus, starting with limit theorems with com¬ 
pletely specified limiting processes and limit laws, after two centuries of struggle 
Probability theory got rid of initial restrictions. 

The general set-up is now visible. The limiting process is that of sequences of 
sums of independent r.v.’s. The queries are about the classes of possible limit 
laws and conditions of convergence. However, so general a limit problem is 
without content. In fact, the limiting process is that of arbitrary sequences of 
r v.’s. let {Tn) be any sequence of rv.’s and take Zn.i = Fn , = £(0) 

for fc > 1, Any law £ belongs to the class of limit laws: take £(F„) = £ Hence 
some restriction is needed. To find a "natural” restriction consider the previous 
problems. Their common feature is that the limiting process is that of sequences 
of sums of independent r.v.’s, the number of summands increasing indefinitely. 
If we wish to emphasize this feature, a relatively small number of summands 
ought not to have a preponderant role in the determination of the limit laws 
A "natural” restriction is then a requirement of uniform asymptotic negligibility 
(uan) of the summands, i e., for every e > 0, P{ | Xnk | > «} —> 0 uniformly in fc. 
We come thus to the Modem CLP. Let S„,,„ = S&i > >'n —> be sums 

of r.v ’s Xn,k, mutually independent for every fixed n, and such that 

max PlZn,* 1 > e) —> 0; 

k 

characterize the class {£)} of limit laws of Ike Sn,y„ and find necessary and sufficient 
conditions for convergence to any element of this class. 

The solution of this problem is essentially due to the results of investigation 
of random functions X{t) with independent increments. Let X(0) = 0, divide 
the interval (0, t) into Vn subintervals (tk-i, h) with = 0, and denote by Xnk 
the increment X(h) — ThenX{t) = X„it where Xn* are independent 

r.v.’s. If, moreover, X{t) is continuous in probability for every t, i.e., if 
£(X(t fc) — X(0) —> £(0) as fc —» 0, then the Xn,k can be chosen to obey 
the uan restriction as —> oo Hence £{X(i)} might be expected to belong 
to (P). 

The particular case of the modem CLP for summands and limit laws with the 
finite second moments was solved by Bawly [1], using Kolmogorov’s char¬ 
acterization of X(i)’s with finite second moments [7], The general problem, 
thanks to a much more general result by P. Livy ([24], 1934), was solved by 
P, Livy, Khintchine ([20], 1937), Gnedenko ([14], [15], 1938, 1939) and Doeblin 
([8], 1938-1939). The method used throughout was that of ch.f.’s. (except m 
the case of Dobhn who used also the P. Livy “dispersion” function). 

One can avoid an explicit introduction of the considered random function 
X{t), limiting oneself to the corresponding (infinitely divisible) laws. For a 
very large n, Sn,y„ is, roughly speaking, a very large number v„ of very small 
(in probability) independent summands. This leads at once to the consideration 



LIMIT THEOREMS 


329 


of laws which possess such a property for any and, first, the infinitely divisible 
(i.d.) laws. A law is i.d if it is a law of sums of an arbitrarily large number of 
independent and identically distributed r v.’s. In other words, f{u) is the ch.f 
of an i d. law is a chf. for every positive integer n. One might expect i.d. 

laws to belong to (D) and, surprisingly enough, it turns out that, because of 
the uan, {D} contains only i.d. laws. 

We can now state the solution of the modern CLP, in three parts. Let 

f +o ^—0 p+a 

= + / , let 0 (x) be any function, defined and non-decreasing in 

J—a J+O 

* + 6 

(—CO, — 0 ) and (+ 0 , + m), with<^(— «i) = (/>(+ oo) = 0 and j- x'‘ d4>{x) < oo, 

and let a and /3 be real numbers. 

I. The function f{u) is the ch.f. of an t.d. law if, and only if, 

log fiu) = iau -§u‘^ + -£’° - 1 - 1 ^ 2 ) 

and f{u) determines uniquely a, and <f>{x) at all the continuity points of the latter- 
(P. L 6 vy). 

Normal laws are obtained for 0 (a:) = 0 and Poisson laws correspond to the 
<l)(x) with one point of increase {x 3 ^ 0) only. The fundamental role of Poisson 
laws appears clearly since, roughly speaking, an 1 d. law is the convolution of a 
normal law and a continuum of Poisson ones. This role is further emphasized 
by the following theorem (Khintchine [20]): A law is i d. if, and only if, it is 
the limit law of sequences of sums of independent Poisson r.v.’s In other words, 
the class of i.d laws is the closure of laws of finite sums of independent Poisson 
r.v.’s. 

II. The class (D) of limit laws of the modem CLP coincides with that of i d. 
laws (P L 6 vy-Khintchine). 

Together with I this result characterizes in an explicit manner the class (Dj. 
An immediate question arises (Khintchme). What about the limit laws of normed 
sums? The answer is the following (P. L6vy [27]). Let y = log [ x [, 
forx < 0, My) = (t){x)ioTx > 0wherey = log | a; |. The limit 
laws of normed sums, under uan, are the id. laws with convex i/kfy), fc = 1, 2. 

In particular a Poisson law does not belong to this subclass [Djir] of [D], 
hence cannot be obtained as a limit law of normed sums. This brings out the 
deep reason for the isolation in which the Poisson law remained as long as the 
limiting process was restricted to that of normed sums ® II shows that, with 
respect to the possible limit laws, the limiting process of the modern CLP is 
definitely wider than that of the classical CLP and of its extension. However 
the entire class (H) can be obtained with normed sums, provided we consider 

* A problem, specific for normed sums, arises’ given r.v.'s X*, find necessary and suf¬ 
ficient conditions for existence of normmg numbers such, that the laws of normed sums 
would converge to a given element of [Djf] and, if they exist, find them. Feller’s NC cri¬ 
terion solves a particular case of this problem. 



330 


M. LOEVE 


not only limit laws hut also "accumulation” laws (P. L(^vy-Khintclime): A law 
IS i.d. if, and only if, il ts the limit law of a subsequence of normed sums of inde¬ 
pendent and identically distributed rM’s. 

I and II provided Gnedenko and, independently, Doeblin with the properties 
which allowed them to find conditions of convergence, thus completing the 
solution of the modern CLP Let 

= f dF(x) - \ [ X 

denote a “truncated” variance of X. 

III. Under uan, — b„) converges, necessarily to an i d. law “for a con¬ 

venient choice of bn”, if, and only if, 

(i) S Pnkix) 4>{x) for X < 0, ^ [1 — Fnk{x)] —> —<l>{x) for x < 0 

k-l k-1 

at the continuity points of 4>{x), and 

Pn 

(ii) hm lim inf 2 <rl{Xn,k) = |8“ • 

«-»0 n A:—1 

In particular, since normal laws correspond to i/)(x) s 0, the NC conditions 
of Feller and P, L6vy follow: £((Sn,c„ — bn) converges to 9l(0, 1) for a convenient 
choice of bn and uan holds if, and only if, for every e > 0, 

Vn A )|hi 

(i) S / dF„k(x) —> 0 and (ii) a\{Xnk) —>• 1. 

k-l J|*|>. *.-1 

The first condition shows that among all limit laAvs under uan, limit norm¬ 
ality corresponds to a sufficiently strong asymptotic negligibility of the sum¬ 
mands, and, more precisely, to 

i: PX.\ X„k 1 > €) -^ 0, 

1-1 

or, equivalently, to 

P (max 1 Xnk) > e) —» 0. 

k 

Another illuminating characterization of NC (Raikov [39]) follows also from 
HI. Take for origin of values of summands the truncated first moments 

J X dFnk{x) Then £(S„,k„ — bn) —> 9l(0, 1) for a convenient choice of bn 

if, and only if, £(52(^i XU) ^(1)- 

6. CLP in the case of dependence. Limit problems for sums of dependent 
r.v’s. were considered for the first time by Markov [^], less than fifty years ago. 
He extended the first two limit theorems of probability theory to the case of 
events linked in chain, i.e , such that P{Ak 1 Ai, ■ ■ ■ Ak-i) = P{.Ak ] Ai-i). 



LIMIT THEOHEMa 


331 


However the crucial work in this field is the celebrated memoir by S. Bern¬ 
stein ([3], 1927) which has the same historical importance for the dependence 
case as that of Lyapunov has for the classical CLP 

Let {Zife) be a sequence of r.v.’s. E'Xk will denote the conditional expectation 

n 

of Xi , given Zi, • ■ Xk-\ . Consider the sequence of sums Sn = ^ Xk , with 

EXk = 0 and let <r„ = A/ ^ o-”(Z„) 

r *- 1 , 

Bernstein’s NC Theorem. 7/ 

(i) - i swp 1 E'Xk 1 0, (ii) 4 E swp I E'Xl - EXl\-^ 0, 

and 

(iii) 4 E swp I 1’ 

<fn k~l 

then 

£ 91(0, 1). 

Obviously, if the Zj,’s are independent, this theorem reduces to Lyapunov’s 
with 5 = 1. The method used is still that of ch.f’s From this result Bernstein 
deduces various particular NC cases and, applying them to Markov chains, ex¬ 
tends the latter’s results. 

The unpleasant feature of the above theorem is the use of suprema of condi¬ 
tional expectations and, except when the r v.’s Xk are bounded, one cannot ex- 
fect these suprema to be finite. On the other hand, the conditional expectations 
are r v.’s and it would be natural to associate their values with the corresponding 
probabilities This can be done and Bernstein’s theorem can be improved in 
various directions simultaneously. First it may be stated for sequences of sums 
<Sn,v„—this is trivial; next it extends to 5 > 0 instead of 5 = 1—^this contains 
completely Lyapunov’s result but is of secondary interest. Then NC can be re¬ 
placed by asymptotic normality, i.e., by the existence of a sequence of normal 
laws 91(0, Vn) such that the “distance” between £(Sn,f„) and 91(0, un) would 
approach zero as n —* °°—this is quite simple to get. However, significant im¬ 
provements are obtained on replacing suprema by expectations. Let Fnix) be 
the d.f. of and Gnix) be that of 91(0, <tI). Then, taking EX,,k ^ 0, we have, 
the following 

NC Theorem. If (i) E kE j E'Xnk 1 —^ 0, (ii) E kE j ^'^"nk — EX^k 1 0 

and (hi) there exists a constant 5 > 0 such that E * 1 ^~ 

Gnix) —> 0. 

This theorem shows that, so far as moments of order higher then the second are 
concerned, the NC condition is the same as in the case of independence In this 
last case the theorem is a slight improvement of that of Lyapunov. In 1941 condi- 



332 


M, LOiiVE 


tions for LLN a,nd NC were given (Lofeve [31], [32]) in the frame of the modern 
CLP, without assuming the existence of moments; when independence is as¬ 
sumed, they reduce to those given by Feller. Conditions for NC which in the 
case of independence, reduce to Lindeberg’s, were then deduced in the particular 
case of finite second moments and special cases of NC, including those con¬ 
sidered by S Bernstein, were obtained. 

The whole modem CLP had not been considered until lately (Lofeve, [33-35]). 
It appeared useful to extend the CLP to an “Asymptotic Central Problem” 
(ACP); primarily, to the behavior of £(S„,,J as n —^ oo. This m turn, led to the 
introduction of laws “in a wide sense,” i.e., .with possible positive probabilities 
for infinite values. To the sequence {iL(/S„,„„)} is associated another conveniently 
chosen sequence £„ of laws of sums; if ^ £ or £„ = £ then the ACP reduce 
to the CLP, The investigation uses an extension of the P. Ldvy convergence 
theorem for ch f.’s and the modern CLP solutions are obtained as particular 
cases The case of sums of a random number of r.v.’s,' as well as the multidimen¬ 
sional case, are easily treated by the same methods [35] 

Many new problems aiise in ACP. The foremost corresponds to possible 
relaxations of the uan condition For instance, in the case of independence, the 
relaxed condition 

max P\ \ Xnk - Fjt 1 > «) —> 0, for every c > 0, 

k 

where Yi, Yi, • are independent, does not change, essentially, the nature 
of the ACP Yet, as soon as dependence is introduced, the whole outlook changes 
and it would be interesting to investigate various new possibilities which thus 
arise. On the other hand, stiicter than uan conditions are of special interest 
when independence is not assumed The one which seems natural is the following: 

max sup P'{ \ X„k ] > e) 0, for every e > 0, 

k 

where P'(Ant) denotes the conditional probability of the event , given 
X„,i, • , Xn,k-i ■ An immediate problem is whether this or an analogous 

restriction enables us to find, not only sufl&cient, but also necessary conditions 
for various convergences and various cases of dependence, 

II. The Strong Central Limit Problem 

6 . The Bernoulli case and its extension. A sequence [An] such that the corre¬ 
sponding sequence of laws converges does not, in general, determine a r.v. 
X which might be considered, in some sense, as the limit of X„ . However, if we 
define two r.v’s X and X' such that P(X X') = 0 as equivalent, then, when¬ 
ever £(Xm — X„) —> £(0) as -f- i 1 0, the sequence {X„} determines a 

mn 

’ H Robbins {Bull, Am Math Sac , Vol. 54 (1948), pp. 1151-1161. studied in detail the 
case of independent and identically distributed ^k’s -with BX], < « and ii„ , independent 
of Xk ’ e , , with < w 



LIMIT THEOREMS 


333 


unique r v. X (up to an equivalence)—for which P{ | X„ - Z 1 > e) 0 for 
every e > 0 This X is the limit in jyrohabiUiy of Z„ . 

Yet, an observed sequence of values of {X„} need not converge to the ob¬ 
served value of X. For instance, let 7 be a r.v. uniformly distributed over (0,1). 
Consider the sequence (Z)n} of partitions of (0, 1) into n equal subintervals 
and to the fc-th subinteiwal of attach the indicator Xn,k of the event when 
Y falls within this subinterval. The sequence Zi,i ; Z 2 ,i, Xu, 2 ; Zs.i, Zs.j, 

Za.s , ■ • converges in probability to zero since PiX^k ^ 0) - - , ior 

fc = 1 , 2 , • • • , n, approaches zero as n —^ «>. On the other hand, observed values 
of Zflfc’s, for fc = 1,2, • ■ • , n, will contain n — 1 zeros and a one, except in cases 
of total probability zero. Hence, except in these cases, any observed sequence 
will contain infinitely many zeros and infinitely many ones and will not converge. 

S 

The Bernoulli theorem means only that /» = — converges in probability to 

Tl 

zero. Borel showed, in a fundamental memoir ([5], 1909), that Bernoulli’s state¬ 
ment IS too weak, and, in fact, that observed values of /„ converge to zero, 
except in cases of total probability zero BorePs proof i.g based upon a direct 
analysis of the de Moivre-Laplace approach to NC. Thus a new domain in 
probability theory was opened to exploration. 

First Strong Limit Theorem. In the Bernoulli case 

P(lim/„ = p} = 1. 

n 

This leads to the introduction in probability theory of the notion of almost 
sure (as.) convergence' 

X„ Z if P{lim Z„ = Z) = 1, 

n 

or, equivalently, if for every e > 0, 

P{ 1 Xn+k — Z 1 > e for To = 1, or 2 or - • ad inf.) 0 as n 00 . 

If we denote by the event | Z„ — Z) > e, we see that we are concerned 
here with 

P = P (realization of infinitely many events .An) “ lim lim P(An+i U ... u 

n—*» V—*00 

) ® From Boole’s inequality 

n+v 

p(A„+iU ..■UAn+,) g E m*) 

follows, at once, the fundamental Borel-Cantelli Lemma. //EnP(Aii) < <» 
then P = 0. This lemma can be extended, using sharper inequalities (Lofeve [32]). 

“Already Poincar^ considered such probabilities in his investigation of ‘'recurrence” 
and this, before the notion of completely additive measures was born. 



334 


M. liOfeVE 


Now apply the Tchebicheff-Markov inequality 

E\Xn 


P{\X„-X\>e\ ^ 


r > Q, 


and the Cantelli criterion follows: if for some r > 0, X) 1 ■X’n — X f < oo 
then Z„ X. 

Applying it, with r = 4, to the Bernoulli case, Cantelli [6] obtained an almost 
immediate proof of Borel’s result. An even simpler proof is as follows: 

® 1 — p r < “ since E(fn — py = hence /„2 — p 0. Moreover, 

2 

1 /k ~ /n* 1 ^ - for 0 ^ r — ^ 2n, hence f, — ^ 0 in the usual sense, 

n 

uniformly in v, and the theorem is proved This last method applies as well to 
sequences of dependent events {B„}, which constitute a natural extension of 
the Bernoulli case. Let 


pi(n) =P 2 (n) = 4 £ PiB.Bi), 

n k-l On I^n 

S„ = p 2 {n) — pl(n) (in the Bernoulli case Sn = 0!). It is very easy to show that 
fn — Pi(n) —>• 0 in probability if, and only if, 6„ —> 0; this extends the Bernoulli 

theorem. Moreover, if «, | in 1 ^ C < » then fn — Piin )—^0 (Lofeve [31]), 

I 6 I 

and Dvoretzky [10] proved that it is enough to have S < ». Thus we 

have a simple extension of Borel’s result. 

The method used by Borel, while uselessly complicated in view of the result 
obtained, is very powerful and, by sharpening it, the law of the iterated logarithm 
(Khintchme [18]) follows. 

Second Steono Limit Theokbm. In the Bernoulli cose 


pjlim 


sup 


Sn - ESn 


= 1 = 1 . 


Vn(2 log log 

where a-„ = <T{Sn)- 

Let us use the following terminology (P. L6vy [26]). A non-decreasing se¬ 
quence {(hn] of positive numbers belongs to the lower class L, if the probability 
that Sn ^ <t>n, from some n onwards, is 1, and it belongs to the upper class U 
if this probability is 0. The following criterion (Kolmogorov) applies: In the 

Bernoulli case ((^n) belongs toL or U, respectively, according as £„ ~T = eo 

O’ n 

or < v>. Clearly this result contains the Khintchine’s LIT. 

7. The general case. The question of domains of validity of the obtained re¬ 
sults arises immediately and thus the SCLP appears in its present form. Let 
Sn = £t-i Xk be sums of r v.’s Xk, independent or not. Find conditions for 1° a.s. 


s s 

convergence of —^ or, more generally [31] of — 
n> On 


a„ I 00 (SLLN). 2° the law 



LIMIT THEOREMS 


335 


of the iterated logarithm (LIT) and, more generally, criteria for classifying 
sequences {^n}- 

The second problem, in the case of independent summands possesses almost 
complete solutions due, respectively, to Kolmogorov [17] and to Feller [13]. 

a. If sup 1 1 = o((r„/(log log (rn)~ ) for k ^ n, then LIT holds. 

b 7/sup 1 1 = 0((T„/(log log <r„y ) fork ^ n, then the criterion for the 

Bernoulli case continues to hold. (Feller also gave sharper criteria). 

In the case of dependent summands general results were obtained by P. L4vy 
[26] and for hlarkov chains by Doeblin [7] The problem belongs (at present) 
to the domain of NC; it is complicated and pnes deeply into the behavior of 
piobabilities as n —> oo . Yet, in the case of independence, the dichotomy into 
classes L and U is more general as shown by the following property (P. Levy 
[26]). If |»Sn} IS a sequence of consecutive sums of independent r.v’s, and cannot 
he reduced by adding constants to an as convergent sequence, then, for any given 
sequence (c„] of sure numbers, P(S„ > c„ for an infinity of values of n) = 0 
or 1. 

The SLLN problem seems easier. Nevertheless it is far from being solved; 
we don’t even know necessary and sufficient conditions for the SLLN in the case 
of independent summands in terms of individual d f’s.° The essential tools are, 
besides the fundamental Borel-Cantelh lemma, 1° the tmncation method to¬ 
gether AVith the convergence in r-mean: Xn —> X il E \ Xn — X f —>-0(r > 0), 
2° the Kroneclcer lemma: If Xk/ak is convergent, then — a:*, —^ 0 

On 

(fln t °°) • I*' provides a possibility of transforming problems about the SLLN 
into those of a s. convergence of series of r.v’s, at least when sufficient con¬ 
ditions are sought for 

In the case of independent summands one can start with the following prop¬ 
erty of series (L6vy [23]): a.s. convergence of ^^\Xh is equivalent to convergence 
in probability. (It can be shown that this property holds also for certain classes 
of dependent summands.) On the other hand, convergence in q m. (r = 2) 
entails convergence in probability. Hence, when EXu < “ > taking EXk as 
the origin of values of Xk , it follows that If a (X„) < “, then Sn a,.s con¬ 
verges. Kolmogorov pioved this result using his celebrated inequality which 
considerably strengthens that of Tchebicheff: 

P (max 1 Sn 1 > «) ^ • 

This inequality has been extended by P. L6vy [26], and by Lobve [32] to de¬ 
pendent summands and conditions for a.s. convergence were deduced from it. 
If the EXk are not finite, the truncation method is applied Put Xk = Xk, 
if 1 Yfc 1 ^ 1 and = 0 if | Xt [ > 1. Then (Khintchine-Kolmogorov) X)-. X„, 

» A first step in this direction is due to U. V. Prokhorov. “On the strong law of large 
numbers” (in Russian), Dokl Ah. Nauk Vol 69 (1949), pp 607-610 See also a paper by 
K, L. Chung to appear in the Proceedings of the Second Berkeley Symposium. 



336 


M. LOilVB 


where Xn are mdependenl r.v.’s, is a.s. convergeni if, and only if, Xln P{Xn 9^ X'„), 
(X'fl) converge 

It IS not difficult to obtain conditions for series of dependent summands 
Let q^it) = P{ 1 Z„ I > t), £„ = j xdP'nix), where Fn{n) is the conditional 

d.f. of Xn, given Xi, ■ • ■ X„_i. If f ig„(t) dt < od for an e > 0, then 
(^n — ^n) A-S. converges. 

By using Kronecker’s lemma the results above yield immediately sufficient 
conditions for the SLLN. Those which come from the last one would in turn 

yield without difficulty the following: Let anj and 7i„ = / xdF'n (x). 

J-eon 

If'UnQniant) ^ q(t) and / lq{t) dt < a>, then — 53 (Hk — Vk) —>■ 0. 

Jo a„ fc-i 

Take now the particular case; a„ = n , and X^’s independent and identically 
distributed. From the stated result follows; 

1 " as 

1. If EXk = m exist, then Xk —‘—r m and conversely (Kolmogorov). 

n t-i 

^+a 

2. If 0 < r < 2, r 9^ 1, E \ Xk\' < ^ and lim / x dFn{x) = 0, then 

v—a 


1 " 


as. 


0 (Marcinkiewicz). 


Other conditions for SLLN, in the case of dependence, are known (L^vy [27], 
Lobve [32]). 

The above result of Kolmogorov is a particular case of the celebrated ergodic 
theorem (Birkhoff [3]) which can be considered as a SLLN for a special case of 
dependence Let d.„ be an event defined on the set (Xt, , • • • Xk „) and 
let be an event defined in the same manner on the translated set 
(X*i+m , • ■ , Xfc„+m), The sequence {X*} is called siafionory if P(A= P{An) 
for every finite set {h , • ■ ■ , fc„) and every finite m. The ergodic theorem states 

that If the sequence [X*] is stationary and P I X* I < <», then — 

n 

converges o.s,^° 

However an unsatisfactory feature of Birkhoff’s theorem (and of its exten¬ 
sions) is that the conditions are not asymptotic—they have to be satisfied for 
every n and not for n —» —^while the conclusion is an asymptotic one. Let us 
only mention that more satisfactory ones, at least from this point of view, 
which contain the previous ones, can be found. 

“For about fifteen years Khintohine, Kolmogorov, Wiener, Yosida and Kakutani, 
F. Riesz, worked to simplify the proof of this theorem. It is only lately that its domain 
of validity has been extended by Hurewicz, by Haknos, and by Dunford and Miller See 
also a forthcoming paper by the author in the Proceedings of the Second Berkeley Sym¬ 
posium. 



LIMIT THEOREMS 


337 


The bird’s-eye view above of the SCLP shows that this problem is only in a 
tentative stage, perhaps because no adequately powerful methods or no ade¬ 
quately general approach to the problem had been found until now. 

REFERENCES 

[1] G. Bawlt, tlber einige Verallgemeinerungen der Grenzwerts^tze der Wahrscliem- 

Uohkeitsreohnung,” Ecfi. Afalfe. Moscoiu, Vol 43 (1936), pp 917-929 

[2] J. Bebnoulli, ArS conjectandi, 1713. 

[3] S. Bernstein, “Sur I’extension du th^orfeme limite du caloul des probabilitis aux 

soimnes de quantit^a d^pendantea,” Math Ann., Vol. 97 (1927), pp. 1-59 

[4] G D Birkhoff, “Proof of the ergodio theorem,” Proc Nat. Acad Sc. Vol 17 (1931), 

pp. 660-666. 

[6] E. Bobel, “Sur lea probabilitda d6nombrables et leurs applications arithm^tiques,” 
Oirc. Mat. d Palermo, Vol, 26 (1909), pp 247-271 

[6] F P. Cantblli, "Sulla probabilita come hmite della frequenza," Rend Accad. d. 

Lincei, Vol 26 (1917), pp 39-46. 

[7] A. Db Moivrb, Miscellanea analytica, 1730. 

[8] W. Doeblin, “Sur lea propri6t6s aaymptotiques . . . de chainea simples,” Bull Math. 

Soc Roum Sc (1937), pp. 1-120 

[9] W. Doeblin, “Sur lea sommes d’un grand nombre de variables al^atoirea ind6pen- 

dantes,” Bull, Sc, Math,, Vol 63 (1939), pp. 23-64. 

[10] A. DvoRETZKy, “On the strong stability of a sequence of events. Annals of Math. 

Stat , Vol. 20 (1949), pp 296-299. 

[11] W Feller, “Uebei den Zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung,” 

Math. Zeit., Vol 40 (1935), pp. 621-539; Vol. 42 (1937), pp 301-312. 

[12] W Feller, “Ueber das Geaetz der grossen Zahlen,” Acta. Litt Ac Scient. Vol. 8 

(1937), pp. 191-201. 

[13] W. Feller, “The general form of the so-called law of the iterated logarithm," Trans 

Am Math. Soc., Vol 64 (1943), pp. 373^02. 

[14] B. Gnedenko, “On convergence of distribution laws of independent random variables 

(in Russian), Dokl Ak. Nauk, Vol. 18 (1938), pp. 231-234 

[16] B. Gnedenko, “On the theory of limit theorems for sums of independent random 
variables” (m Russian), Bull Ac Set, URSS (1939), pp. 181-232, pp 643-647 

[16] A Kolmoqoroff, “Ueber die Summen duroh den Zufall bestimmter unabhangiger 

Grossen,” Math Ann , Vol. 99 (1928), pp 309-319,102 (1929), pp. 484rA89 

[17] A. Kolmogoropf, “Ueber das Gesetz des iterierten Logarithms," Math, Ann , Vol. 101 

(1929), pp 126-135 

[18] A Kolmoqoroff, “Sulla forma generale di un processo stocastico omogeneo,” Rend 

Accad d Lincei, Vol. 15 (1932), pp. 805-806, pp 866-869 

[19] A Khintchinb, “Ueber einen Satz der Wahrscheinlichkeitsrechnung,” Fund Math., 

Vol 6 (1924), pp. 9-20 

[20] A Khintchine, “Zur Theorie der unbeschrankt teilbaren Verleilungsgesetze, Rec 

Math. Moscow, Vol 44 (1937), pp 79-119. 

[21] P S Laplace, TratU des Probabilities, 1812. 

[22] P LAvy, Calcul des ProbabiliUs, Gauthier-Villars, 1925 

[23] P L£vt, “Sur les series dont lea termes sont des variables dventuelles inddpendantea,” 

Studia Math Vol 3 (1931), pp 117-165. 

[24] P LAvt, “Sur lea integralea dont les dldments sont des variables aWatoires inde- 

pendantes,” Ann d ScuolaNor di Pisa, B (1934), pp 337-366 
[26] P. LfiVY, “Propridtds aaymptotiques des sommes de variables aldatoires inddpendantea 
on enchaindes,” Jour. Math. Pures Appl, Vol 14 (1935), pp. 347-402 



338 


AI LOEVE 


[26] P. L£vy, “La loi forte des giimds nombres pour les vaiialiles aldatoires enchamdes," 

Jour. Maih Pures Appl , Vol. 15 (1936), pp 11-24 

[27] P L^ivy, Thiorie de I'addiiion des variables alealoires, Gauthier-Villers, 1937 

[28] A. Liapounoff, “Sur une proposition de la thdone des probabilitds,'' Bull, Acad 

Sci. St. Pelersbouro, (5); Vol. 13 (1900), pp 359-386 

[29] A Liapounoff, “Nouvelle foime du thdordmo sur la limite de probabihtds,” Mem 

Ac. Si, Pelersbourg, {&) Vol 12(1901), 

[30] J. Lindebeho, “Eine nene Heileitung des Expoiientialgesetzea in der Wahrachein- 

liohkeitsrechnung,” Vol 15 (1922), pp 211-225 

[31] M LobVE, “La tendance centrale pour des variables aldatoires lides,” C. R, Ac. Sc. 

Paris, Vol 212 (1940). 

[32] M. LokvE, "Etude asymptotique des sommes de vaiiables aldatoirea lides,” Jour. 

Maih Pures Appl., Vol. 24 (1945), pp 249-318 

[33] M LobvB, “Sur IMquivalence asymptotique des lois,” C. R Ac. Sc. Pans, Vol, 227 

(1948), pp. 1335-1337. 

[34] M. LoiiVE, “On the Central Probability problem," Proc Nat. Acad Sci., Vol 35 

(1949), pp. 328-332. 

[36] M LobvB, “On seta of Probability laws and their limit elements,” Statistical Series, 
Vol, 1, pp. 53-88, University of California Press, Berkeley, California 

[36] J Marcinkiewioz, “Sur les fonctions inddpendantes," Fund Math., Vol 30 (1938), 

pp 349-364. 

[37] A Markoff, Calculus of probabilities (Russian), 1913. 

[38] Poisson, Recherches sur la Probabtliti des Jugenienls 1837, pp. 206-207. 

[39] D RaIkoff, “On connection between the central limit theorem of probability theory 

and the law of large numbers," Bull. Ac Sci URSS, Ser. Math , (1938), pp. 
323-338 

[40] P. L. Tchebicheff, Oeuwss, Vol I, pp. 687-694; Vol II, pp. 481-492. 



A RANDOM VARIABLE RELATED TO THE [SPACING OF 
SAMPLE VALUES 

By B. SheemanI 

University of Southern California 

1. Introduction and summary. Let a: be a random variable with continuous 
distribution function F{x). Then y = F(a:) is a random variable uniformly dis¬ 
tributed over [0, 1]. If ail, a: 2 , ■ • ■ , 3:„ is an ordered sample of n values from the 
population F{x) then yi, y^, ■ ■ , yn (y. = F{xf)) is an ordered sample of n 
values from a uniform distribution over [0, 1], For n large it is reasonable to 
expect that the y^ should be fairly uniformly spaced. Measures of the deviation 
from uniform spacing can be devised in various ways. Thus Kimball [2] has 
studied the random variable 

n+l / \2 

SVW 

where xd = — <x> and a:„+i = + oo, conjecturing that a* is asymptotically nor¬ 
mally distributed. Moran [3] has studied the random variable 

n+l 

d = E (F(a:.) - F(x^i))\ 

which differs from a only by the quantity ’—2/{n -f- 1) + («- + 1)”^, and has 
proved that j3 is asymptotically normally distributed. Somewhat related to these 
two random variables is the quantity la introduced by Smirnoff [4]. This is 

= n {F(x) - F*{x)y dF{x), 

although it is slightly more generally defined in Smirnoff’s paper. Here F*(x) 
is the sample distribution function ([1], page 325) of a sample of n values from 
the population with continuous distribution function F{x). The variable w may 
be written ([1], page 451) 



(2^ — l)/2n IS the midpoint of the interval ((i — l)/n, i/n). Thus, if [0, 1] 
is partitioned into n equal subintervals then (if measures the deviation of the 
sample values yi = F(x,), i = 1, 2, • ■ • , n, from the midpoints of these in¬ 
tervals. Smirnoff has investigated the asymptotic behavior of of obtaining a 
rather complicated non-normal asymptotic distribution. 

^ I wish to thank Professors J. W- Tukey and S. S. Wilks for their helpful suggestion 
and criticism. 


339 



340 


B. SHERMAN 


It IS possible to construct a definition of deviation from uniform spacing which 
permits a broader investigation than these random variables. This is 

F(x-.) - F(x,^,) - , 

71 i 

where again aio = — oo and = + “ and F(x) is a continuous distribution 
function. (In Theorems 3 and 4 it is assumed additionally that F'ix) exists and 
IS continuous except for a finite number of points). It is to be noted that 

0 g ^ 1. 



Generally speaking use of the absolute value in circumstances like this is an 
undesirable procedure, but it turns out that On is relatively easy to handle, al¬ 
lowing a fairly simple calculation of its moments (which aie independent of 
F{x)). These are (/< = min (k, n)) 



Thus m particular the mean of is 



and the variance is 




211 "+^ -k nin - 
(n -k 2)(n -f- l)"+2 


n 

ji -k 1 


271+2 


2e - 5 1. 

r>-* --- 

e® n 


These results will be established in Theorem 1. From the moments the charac¬ 
teristic function of a)„ may be obtained, and indeed in finite terms From the 
characteristic function the distribution function of ai„ may be readily calculated. 
The distribution function is written out explicitly at the end of Theorem 1. 

To determine the asymptotic distribution of the standardized variable 

(il„ — .F(wn) 

DM ’ 

it 18 sufficient to examine the behaviour as n —> « of the moments of this variable 
or equivalently the moments of the variable 

For it is easy to show that if the moments of the standardized variable approach 
the moments of a unique distribution function F{x) then the distribution func¬ 
tion of the standardized variable approaches F{x). In this manner it is proved 



SPACING OF SAMPLE VALUES 


341 


in Theorem 2 that the distribution function of the standardized variable ap¬ 
proaches normality. 

Since the asymptotic distribution of the standardized variable 


tiln EjUn ) 

is known it may be used as a test for goodness of fit if the number of sample 
values IS large. Thus suppose Xi, Z 2 , • ■ • , a:„ is an ordered sample of n values 
from some population and we wish to test the hypothesis that the population has 
the distribution function F{x). Then we calculate the quantity 


1 

DM 



Fixd ~ F{xM - ~ 

n -f- 1 


- EM 


x„, 


and if this quantity exceeds a certain value which depends on the level of sig¬ 
nificance at which we are working we reject the hypothesis. Let us say that 
P{Xn > A) = B. The probability of rejecting the hypothesis when it is indeed 
true is then precisely B and this is small if A is sufficiently large. But suppose 
that the hypothesis is false and the sample values come from a population whose 
distribution function Cr (a;) ^ F{x) Then we would desire the following property 
to hold for the random variable X„, namely, for any fixed positive A the prob¬ 
ability that Xn exceeds A approaches 1 as n —*• w. For in this case (and when n 
is large) we are almost certain to reject the null hypothesis when it is false. A 
test for goodness of fit which satisfies this criterion, i.e where the probability 
of rejection approaches 1 as n —> <» when the null hypothesis is false, is called 
consistent by Wald and Wolfowitz [5]. We wish to prove then that the test for 
goodness of fit which uses the random vai’iable is consistent. To express the 
matter formally we wish to prove that (the probability density element of 
xi, X 2 , ■ ■ ■ , a;„ is Til dG{xi) dG{x 2 ) ■ dGix„) in the region 

— °o < X1 < X2 < ••• <!„<+“ 


and zero outside that region). 



where Di is the domain 


if Fix) = Gix), 
if Fix) ^ Gix), 


— < xi < Xi < • •<x„ 


1 

DM 



Fix,) - FMi) - 



> A 


The first assertion here is proved in Theorem 2. The second assertion is equivalent 
to proving that for any fixed positive A 



342 


B. SnEKMAN 


where Da is the domain 


1 n-t-t 

EM - ADM < 5 Z 


— CO < < 3:2 < 

n+l . 

FM - f'(a:,_i) - 


< a;„ < + CO, 
1 


71+1 


< E{wn) + AD{o}n), 


when F{x) ^ G{,x). NowD(w„) is of ordern""-'^, D(a)„) = g~^+ terms of order 
n~^ and A is fixed. Hence it is sufficient to show that, if a;i, 0 : 2 , • • • , Xn is an 
ordered sample of n values from a population with distribution function G{x), 
then the random variable 


fln 



Fix,) — Fix,-i) 


1 

71+1 


(it is necessary to draw a distinction between w„ and since Fix) ^ Gix)) has 
a mean L 9 ^ e~^ and a variance D*(S1„) —> 0. For then we have, when n is 
large enough so that the interval 


[Eioln) ~ ADiun), Eiciln) + AD(Wn)] 
falls outside [L — ^ | L — |, L + ^ 1 L — +' |] and | L„ — L | < 11 L — |, 

PiEM - ADM < 0, < EM + ADM) 

^Pi\Sln- L\^^\L~ e-^\) 
^Pi\Qn-Ln\-^l\L- e-^ I) 

^ Ei\ - L„ I) ^ DM) 

- l\L - e-M - l\L - e-M’ 

and this implies (0.1), 

But now in Theorem 3 it is shown that the mean of the random variable 
is (writing fc(x) = GF^^ix), fc(x) a monotonic function such that /c(0) = 0 and 
kil) = 1) 

This expression approaches 

f dx 
Jo 


and this integral can assume the value e~^, which is its minimum relative to the 
class of monotonic functions such that fc(0) = 0 and fc(l) = 1, only when fc(x) = x 
i.e. Fix) = Gix) Finally in Theorem 4 we prove that D^M) 0 and thus it is 
established that the test for goodness of fit based on Xn is consistent. 

2. Moments and as 3 nnptotic distribution of oj„ . 

Theorem 1. Let Fix) he a continuous distribution function If xi , Xi , ■ ■ ■ , x„ 
is an ordered sam-ple of n values from the population whose distribution function is 
Fix) then the random variable 

Fix,) - FixM - . 

n+l 




SPACING OP SAMPLE VALUES 


343 


where Xo = — ^ and x„^i = + co, has the moments 


OCnk 


where n = mm (fc, n). 

The probability density element of the a, is ([6], page 90) 
n' dF{xx) dF{x2) • ■ • dF{xn) 

in the domain D*: — <x> < xi < < ■■■ < a:„ < + oo and zero outside of this 

domain. Then 

ct„k = n< jj u* dF{xi) dFixi) • • ■ dFixn). 

If we make the transformation yi = F{x^, i = \,2, • • ■ ,n, then 





Vi - 2/t-i - 


1 

n +1 


k 

dyi dyi ■■■ dy^, 


where Dy is the domain 0 < yi < Vi < • ■ ■ < i/n < 1, thus indicating that the 
moments of to„ (and therefore also the distribution function of con) are indepen¬ 
dent of F{x). Here ya = 0 and y^+i = 1. The transformation 


Ui = yi, yi = ui, 

Ui = y 2 — yi, 1/2 = wi + U 2 ( 


Un = Vn — Vti-l , 3/n = Ml + «2 + ' ' + Un , 

Un4l = 1 /n+l - Vn, Vn+l = Ml + M2 + ’ ' ’ + M„ -f M„+i = 1 , 


whose Jacobian is 1, then yields 

[illl“‘';rinO 


Oink 


--/•••/Bs 


Ui — 


M -b 11 




n -f 1 


~ (mi M2 • 4" m, 


.)|I 


dui • • • du^, 


where Du is the domain X) m, < 1, m, > 0, f = 1, 2, • • •, n. 

•-1 

The domain Du can be regarded as the union of 2"^^ —2 subdomains in the 
following way First the hyperplane Uj 4 - 'M2 + ■ • ■ 4 - m„ = m/(m 4- 1 ) divides 
thel domain into two parts. In the part of the domain below the hyperplane, 
i e where Mi 4- Mj 4- ■ 4 - «„ < n/(n 4- 1), we have a subdomain defined 
by the statement: k of the variables m, are greater than {n 1) and the 



344 


B. SHERMAN 


residual group of n — k are less Uian (n + 1) \ There are such subdo¬ 
mains and it is clear that, because of the symmetry in the u,, the intregal of 

n+l “Ifc 

Ui -— over each such subdomain is the same There are alto- 

L2 w ^ n + 1 . 

gethcr ^ = 2" — 1 such subdomains, k 9 ^ n because of the inequality 

Ml 4- + ■ ■ • + Wn < n/(n-\-l). In the part of the domain above the hyper¬ 

plane 

Ml + M 2 4" ■ 4~ Mn = n/(n 4" 1)> 

i.e. where Mi -)- Wj 4- ■ • 4- m„ > n/(n 4* 1), the reasoning is exactly the same 
except that here k ^ 0. Thus we may write 

“•*" § C") /■ ■ ■ / [,S. (^T“1 - “■)]* 


^r2 

1 ' 
n + l, 

V "|k 

1 dui dui • • 

• dUn, 

where Dri is the domain 




n ^ 

2 Mt •< 111 M, > 

,_i n -f 1 n 

1 

+ 1 

0 = 1,2,- 

■ •, r), 

0 < M. < —^ 
n 4- 1 


(^ = ^4-l,• 

• •, m), 

and Dri is the domain 




n 

, < £ M, < 1 , M, > 

M 1 ,-1 

1 

M 4- 1 

(i = 1,2. • 

• ■, r), 

0 < M, < - ^ 

n -f 1 


(^ = r 4- 1, • 

• • , n). 

If we introduce the variables 




1 

Z, = Ui -— 

n 4- 1 


a = 1,2,- 


1 

Zi = -- - M. 

n 4 - 1 


(i = r + 1, ■ 

•', m), 

we get 




"■* = § (r) /■ ■ ■/ (,.t. *■■■■*• 

^rl 





"'sC”)/•■■/(§*') *•’ 



SPACING OF SAMPLE VALUES 


345 


where Ati is the domain 


Z) 2i < 2 2,, 2. > 0 

t—r+1 


“h 1 


> 2 . > 0 


{i = 1,2, • • •,>■), 


(i = r + 1 , • • •, n), 


and Ar2 is the domain 


nr 1 " 

E 2 . < E 2 . <-— + 5 2 ., g ,>0 (i = i, 2 , 

i—r+l 7 L ~\- X t—H -1 


n 1 


> 2 , > 0 


U = r + 1 , • • •, n). 


To effect the integrations with respect to the variables 2i , 22, • • 2, we take as 

volume element in the r-space of 21, 22, • ■ Zr the volume between the hyper¬ 
planes 2 i -h 22 + ■ • ■ + 2r = C, 2, > 0 and 21 -f- 22 + • ■ Zr = C + dC, 

C (f~^ 

2, > 0 . This volume element is d ~ = ^ dC. Thus 

r! (r — 1 ) I 






/ /” 

Jo L-'o 


(r - 1 ) I 




' ‘ ‘ dZn 


l/n4-l /.1/n+l r ;.(l/n+!)+ 2^ «» /vk+r-1 -J 

i [/j.. 

»-r+l 

pl/n +1 "I 

/ —j (2r+l + • • ■ +2n)*'^^ d2r+l • • • dZn 

Jo Tl 


= m E • • ■ f’""' 

r-O V / Jo JO 

+ n!E(”) p+i p+i 

r_i \r / Jo Jo 


(fc -b r) (r — 1)! 

/ 1 

( + 2r+l + ■ ■ • + 2n I d2r+l ■ ’ ‘ dZn 


r-l \J" / Jo Jo 


■''‘t=i\rJJo Jo {k + r){r-l)\ 

' (^r+l “i“2n) dZr^x • • ' dZjt* 

In order to perform these integrations we use the formula 


f ■ • • f (-B -|- ail + a ;2 + ■ • ■ + a^n)™ daii • ■ • dx„ 
Jo Jo 


ml 

(m -b n) 1 ^ 


£ (- 1 )"-’ ii) (B + qAr^, 

9-0 \ 3 / 



346 


B. SHERMAN 


which is established immediately by induction on n. Then 


n—1 n—r 


i-iy ' '' {k + r)l fn\/n — r\ f q \ 


n+A. 


^ro ?::o r\ (n + k) 

(k + r~ 1) 




(f - 1)! (n + k) 

-»ii; 


rC")(”r)(«^)" 


f=it=o(r~l)l (n + k)l 
The first of these double sums is equal to 

n! /c! 

(n + k) 




. n+k 


- C t T 5 (:) (»-Ti)'“ [§ (”; C ^ 0 ]' 

Let us assume first that n g k. The expression within the brackets is the coef¬ 
ficient of x”~^ in (1 — a:)"“’(1/(1 — = (1 — and this is only 

when g ^ n — k and then it has the value ( \ 

\n - 5/ 

equal to 


Thus the first double sum is 


/ti+fcy^ f / k )/n]/ q Y'‘ 

\ k ) ,^-k\n- q)\q)\n-^l) 

-("tT5C)C)(:-^".r 


Similarly the second double sum is equal to 

v-l k-l 


CtTsCTOCxOC^ir* 

and the third is equal to 

(”tT§C- 0 (:)(^r 

Thus, using the identity 

e)(:)+C; 0 ( 4 i)-(- 0 (:)=C:OC:')’ 


we get 


Otnk 


^ fn + kY V + l\ fk - l\ (n - sY+* 

\ k ) ^ y + 1/ \ s ) • 


If however k > n then a similar argument shows that we get an expression for 
oink which differs from the above only in the upper limit of the summation, which 
is n — 1 in this case. Thus the theorem is proved. 



SPACING OF SAMPLE VALUES 


347 


The distribution function of w„ is 

71 —7“1 

Fix) = 1 + 




n + 
q + 


0 




+1 

where r is the non-negative integer determined by the inequality 


n -t- 1 


^ a; < 


r + 1 
n + 1 


Fix) = 0 when a: ^ 0, Fix) = 1 when x ^ n/in + 1) and Fix) is a polynomial 
of degree n m each of the intervals 


\n -f 1’ n -f 1/’ 


= 1 , 2 , • 


Theorem 2. The random variable w„ is asymptotically normally distributed 
iEiun), D(w„)); i.e., the distribution function of the standardized variable 


u„ — Eiw„) 


DM 


approaches 


- r 

\/2t 


di. 


It is sufficient to prove that the moments of the standardized variable approach 
the moments of the normal distribution. For in general it is known that if the 
moments a„k of F„ix) approach the moments at of a uniquely determined dis¬ 
tribution function Fix), then Fnix) converges to Fix) in every continuity point 
of the latter (M. G Kendall, Advanced Theory of Statistics, Vol. 1, Third edi¬ 
tion, Charles Griffin and Co., 1943, pp. 110-112). 


Now Eioi„) 


and D^iun) 


2e — 5 1 


c 

n 


so that the two vari¬ 


ables 

Z)(c0n) 


n 

— - ) have the same limiting distribution. Thus 

w/ N «/ . y- 

it IS safficient to prove that the moments of j \ u„ - -J tend to the moments 
of the normal distribution. In the following argument we take y = k since n -> «>. 

[(r(“'-o”]=(r5(T)“*(-r 

a- V V _ 

“ (2e— L wi! in+k)\im-k)\ 


E 


( 2 . 1 ) 


(-l)”‘-"7ll6*’ 



348 


B. SHERMAN 


Suppose noAV that it has been proved that E 


lY«-i 

Oin — - ) 

e/ J 


tends to a 


finite limit as n —> <», ie., that the limiting moments of order 2m exist, 
m = 1, 2, ■ • . If m is odd 


E 




Hence, if m is odd, E 


m/2 / 


lyi 

tOn — - I 

ej J 


is bounded as R —^ Now the ex¬ 


pression in the bracket on the right of (2.1) can be expanded in a convergent 
power series in n~^ provided that n > m. Because of the factor and because 

the left hand side of (2 1) is bounded as r —> oo this power series must have —, 

n’’ 

where p ^ ^ - (since m is odd), as its initial non-vanishing term But then 

the left hand side of (2.1) must approach 0 as n —> “. Thus if the limiting mo¬ 
ments of even order exist the limiting moments of odd order are zero. We may 
now restrict the discussion to even order moments. 

Replacing m by 2m in (2,1) 

L\c/ \ " e/ J (2e — 5)" L(2m)! 

(—l)*mle^__ /n -f l\ //c — A/w — sV'*'*’"] 
Im — k) 1 \s 1/ \ s /\n +1/ J' 


2m A:—1 

+ zi: 


i-i ,_o (n -f /c) [(2m 

Let us introduce the index q = k — s — 1 which runs from 0 to 2m — 1. 
Then 


E 


■/nv ( _ lyn = r ^ 

\c/ V” e) J (2e - 5)*" L(2m)l 


2m—1 2m 


+ E r 

g-0 fc-Q+1 (.n + fc) 


!(2m - fc)! \k - q)\ q 


k + q + l Y+*~ 
n 4- 1 / _ 


n”‘(2m)! 
(2e — 5)”* 


^ i , Cl2 i 


n 




I t ^^+1 j 




In order for lim E 


z = 0,1, 2. 


m 


im-T] 10 exist it is necessary to show that a, = 0, 

- 1. Then lim S [(-)’" (co„ - -Y’"l = . if we de- 

L\c/ \ e/ J (26-5)™ 



SPACING OF SAMPLE VALUES 


349 


termine the coefficient a,, of ti 'in the expansion in powers of n ^ of 
^ (-l)"n!e" /n + A A - 1\ 


2n / 

V _ [ 

r-a+i {n + 


we will then have 


-l)"n!e" (n + l\ /k - A 
k)'(2m — k)\\k — q) \ q ) 

/n - fc + 

\ n + 1 ) n' ’ 


7 

0,j = ^Kn 


j = 1,2, ■ • m. 


It can be established at once that oo = 0 For if we set g = 0 in (2 2) and let 

^ (■—1')*' 1 

n —» “ then (2.2) has the limit ^ — -rvrrt ~ —tttwi ■ determine the 

^ (2m—A:)1A:' —(2m)! 

expansion of (2,2) in powers of n~^ it is sufficient to focus attention on the expan¬ 
sion in powers of n~^ of 


(n + k) 


-^in + l)in)---{n~k + q + 2) 


_ (n + l)(n) ••• (n — k + q + 2) / n — ?c + g + f- V 

(n /o) (n -f- 7c — 1) • • • (^ + 1) \ + 1 / 

or equivalently on the expansion m powers of a: of the function 




fc -f g -f 2 






k + q + 1 


_ a:“(l — a:)(l — 2x) • ‘ • {1 — {k — q ~ 2)x) / l — (/c q l)x Y 

“■ (1 -b 2x)(l + dx) {1 + kx) \ 1 + x / 

= X®(aA;aO + akalX + -j- • • •) = X^F{x). 

Here Otgo = and the other coefficients may be obtained by a recursion 

formula. Thus; 

1 Dii\F{x) = -. [F{x)D log F(x)] 

pi pi 

= - Z ~ •P’(.x)D^Lt'> log F(x). 

p! n:o \ s / 

But 

log F(,x) = [Q + fc) log (1 - (fc - g - l)a:) 

- (- + log (1 -H a;) + Z log (1 - «) - Z log (1 + ^x) 

\X / »-!■ ' -* 


= Si \ik- q - 1)‘+^ ~2k+q+l) 

L- ' , j. _ n 1. 


/ 1 \ k—q—i k 

(-«■ (i - * -11^2) - S ~ S 


1)A"+' = s'bfc,., 



350 


B. SHERMAN 


60 that 

jl^ p—l ^ 3>—1 

dkqp ” { ) (p S l) I fl&gCj)—<— 1) S ! dfeq(p^i^X)^kq» * 

yl j-0 \ 5 / ?> «-0 

Of hq, we need merely notice that it is a polynomial in k of degree s + 2 and 

that 6*50 = + B, where A and B depend on q only. We wish to 

determine the value of akq(i-q) and to this end we solve the system of linear 
equations 


p-i 




J P-1 

^ i afcoCp—i— 1 ) 6 * 5 * 0) p = Ij 2, ■ • • ,i fl. 

P 1-0 

o* 5 (,- 5 ) is therefore a quotient of two determinants. The determinant in the 
denominator has the value (— 1 )*”° while the determinant in the numerator 
can be expanded by its last column and is therefore the product of (— 
and a determinant 5*,, wnose entries da^ , a, 0 = 1, 2, • ■ • t — q, can be de¬ 
scribed as follows If ^ > a -t- 1 then da^ = 0. = — 1 and when 13 ^ a, 

da^ = - 6 * 5 („_^) , a polynomial of degree a — ^ + 2. Thus a*,;,-,) = . 

OL 

The determinant Bkq{ is a polynomial of degree 2(^ — q) in Ic and the term of 
this degree comes only from the product of the diagonal elements. For 

5 * 5 , = 1 1 = 2 ± II d„<,) where <r{ot) g a 1 and (<r(l), o-( 2 ), ■ ■ (r{i - q)) 

a"»l 

is a permutation of (1, 2, — •-i — q). The terinjJ[ daaia) has degree 

0-1 

(a — v(q!) •+■ S(a)) = 2 5(a) where i(a) = 2 if iT(a) ^ a and 5(a) = 1 


0—1 


0-1 


if a{a) = a d- 1. But 5(«) = 2(i — q)*^ 6(a) = 2*-* cr(a) ^ a ♦-* ( 7 (a) = a, 
0—1 

so that it is the product of the diagonal terms and only that product which gives 
to the term of degree 2(i — q) in the expansion. Thus 

5*4* = 7 —^—r. + terms of lower degree in fc 

(»-?)! 

1 / , 2 0-5)-! 

-(r 4 ^(- 5 ) S 

We are now in position to evaluate a,, . 


fll5 — 


(- 1 ) 


*- 5+1 (2m — fc) l(fc 


V /fc - i\ 
(fc^iV 2 h^'- 


= z 


(-l)V A - A B 

,-5^1 ,^,-1 - fc)l(fc - ?)! V 2 / 

_ _ ^ (-l)‘fc“'’'~’> (h - l\ 

(i-g)l\ 2) * 4 +i( 2 m - fc)!(fc - 5)1 \ 5 / 


+ z 


(—1)*"* 


*-s+i (2m — fc)l(& 




(2.4) 



SPACING OF SAMPLE VALUES 


351 


To complete the evaluation of a.^ we observe that 

- (_1)V,‘ /fe-i\ fi ifl = 2(m-g), 

(^-l)r(h- 5 )l V g j = f 

[O if J < 2(m — g). 

(2 5) implies that a,^ = 0 if f <c ??i and therefore = 0 if j <C m The proof 
of (2.5) IS brief. We note that = 22 Cj ^ ^ , where Cj is independent 

of I and ci_i = (1 — 1)!. Then 


^ (-l)Vc‘ 

*4+-i {2m - k) \{k 


= E 


_A - A _ V 1 /2m - g\ 

-g)!\ 2 / i4+i g'(/i: - g - l)!(2m - g)'\fc - g / 

(T-7)cr) 

fLt(-«‘(T-7)(770]. 


Z-l 2m , , 

- 2 : i: (-1)* -nn#-K 

,_Q i-,+i (2m — g)!g!(fc — g — 1) 


Cj(j + g + 


pi (2m - g)7! 

The expression within the brackets is the coefficient of in 

(1 - 


1 


= (1 — a;)*"* and this is zero if; < 2(m — g) — 1 
and 1 if j = 2(m — g) — 1. Accordingly 

k^i (2m — g) i {k 

'0 


- 2 )'\ Q } 


[2(m - g) - l]l[2(7n - g) - 1 + g + 1]! ^ ^ 
(2m — g)![2(m — g) — IJlgl g' 


if f = 2(m — g), 
if I = 2(m — g), 


and (2.5) is established. Returning to (2.4), a„ = 0 when i < m, while 


— 


(m - g)lg 
and now applying this expression to (2.3) 




e; 


Thus 


.“s, ® [(")' (“" - 0T ■ 


ml 2" 


a™ (2m) I 


(2e - 5)”*. 


( 2 m)! 


(2e-5)l m!2'" 


and these are precisely the even order moments of the normal distribution. 

— B(Wb) 

f co„ — -1 is asymptotically normal and so is —o^oT)—^ ' 



352 


B. SHERMAN 


The skewness 

bin — E(o3„) 


=(^-y 


and kurtosis ^2 = —! of the standardized variable 


DM 


are 


ft - s - 

„ , 1 24e’ - 336fl’ + 1368fi - 1718 
+ i2e-5)^ 


.356 


n 


+ 0(n-=), 


+ 0(n-“) = 3 - 


105 


n 


+ o(vr\ 


3. Consistency. According to previous discussion in order to prove the con¬ 
sistency of the test for goodness of fit based on the asymptotically normal 

variable it is sufficient to show that, if ii, 0 : 2 , • ■ • , a:„ is an ordered 

DM 

sample from a population whose distribution function is G{x), then the limiting 


mean of the random variable ^ 

2 1^1 


i FM - F(xM -- 


is not equal to 


n 1 

e~^ if F(x) ^ G{x) and the limiting variance of this variable is zero This is 
the content of the next two theorems. In connection with these theorems it is to 
be observed that, when y = F(x) is continuous, F~\y), 0 ^ y S 1, can be 
defined unambiguously by writing F~^iy) = [Sup x - y = F{x)] except for y = 0, 
and ^“*(0) = — <». The function k(x) = GF~\x) is then a non-decreasing 
function mapping [0, 1] into [0, 1] and such that fc(0) = 0 and /c(l) = 1. Now 
if F'{x) exists for all but a finite number of points and is never zero then F~^{x) 
is continuous and so is k{x). If further G'(x) and P'(,x) exist and are continuous 
except for a finite number of points then {F'(x) 9 ^ 0)k'{x) enjoys the same 
property. These remarks justify the substitutions and partial integrations that 
are effected in the course of the next two theorems. 

Theorem 3. Let F{x) and G{x) be continuous distribution functions whose 
derivatives exist and are continuous except for a finite number of points. If 
Xi, X 2 , ■ ■ ■ x„ is an ordered sample of n values from the population whose dis¬ 
tribution function is G{x) then (fc(x) = GF~'^{x)) 


EM =e(^ g I Fix;) - FMd - I) 

- M [1 - +'■«]' 

The integral I dx has, relative to the class of monotonic functions such that 

Ja 

fc(0) '= 0 and fc(l) = 1, the minimum value and assumes that value only tuhen 
fc(a:) s X i.e. Fix) = Gix). 

Let us suppose first that F'(x) 7 ^ 0 Then is continuous and it is dif¬ 

ferentiable at all but a finite number of points as is also the function 
GF-\x) = kix). 



SPACING OF SAMPLE VALUES 


353 


- -2 § ® (1 - .-Ti I) 

(31) . 1 B (I P(xO - ^ 1 B (11 - I) 


+ 


IPi 


F{xd -Fix^x) - 

n + 1 / 


The joint probability density element of a:j_i and x. is 

?l 1 

(t - 2)1 (n - i ) \- GCxO)”-; dQM dG(x,) 
in the domain — «> < a:,_i < x, < + oo and zero outside that domain. Hence 

1 ^00 -X* j 1 

-;rn 

. — ^ ^ o(x^xy-\i - G{x,)r-:dGix^i) doix,) 

(i — 2)\{n — i)' 

. [1 - GiY) + OiX)r^ dGiX) dG{Y), 


and making the transformation y = F(F) and x = F{X) the integral on the 
right can be ivritten 

hn{n - 1) f fly - X - f- 1 [1 - k{y) + fc(x)]""^ dk{x) dk{y) 

Jp Jo I ^ + 11 

-y + [1 - 

+ n(n -1) f f ^ ' (y - X -[1 - k(y) + fc(x)]"“^ dk(x) dk(y). 

Ji/n+i ■'0 \ n -r i-/ 

Integrating partially with respect to x, the expression on the right becomes 

- !? ffn- k{y) + fc(x)]"-^ dx dkiy) 

2 Jo 

f (y- 1 ) [1 - k{y)r^ dkiy) 

JlJn+l V « + 1 / 

/.l |•y—{l^n+l) 

+ n I I [1 - *(2/) + 

Jl/n+1 *^0 



354 


B SHERMAN 


and now integrating with respect to y 

I S E (U(x.) - ^ I) = + H' [1 _ ]o(x)T dx 

+ K f f ti ~ Hx)r dx — f kix)" dx 

Ji Jo wl/n+l *'0 

J ^n/n+l r* / -j \ “jrt 

[ ^l-k{x + ^^^j + Hx)^ dx. 

The other two terms in (3.1) are treated similarly. The probability density 
element of xi is n(l — G*(a:i))"“^ dG{x^ so that 

*® (1"“ - I) ■ 5 £ I 

= 5 (1-fc(a;))’'-^dfcCT) 

Z JQ \ 71 “T J. I 

-I 1 ^1/n+l 

+ 5 f (1 — Hx)y dx. 

I Jl/n+l 

Similarly we find that 

1 ^n/n+l -j 

+ - / /c(x)” dx — ^ h{xY dx. 

Z J!) Z Jl/n+l 

Thus 

j.n/ti+1 r / H \ -In 

E{Un) - 1 — fc (X -f ^ ) + /c(x) J dx. 

This result is, however, mdependent of the hypothesis F'(x) jZ o. For if F'(x) 
is sometimes zero we may select a sequence of distribution functions Fm(.x), 
m = 1, 2, • • • , which converges everywhere to F{x) and which is such that 
Fm(x) jZ 0. The Fm(x) otherwise satisfy the conditions of the theorem. If £2mn 
is that function of Xi, X 2 , ■ • , Xn obtained by replacing F(x) by F«(x) in n„ 
then flmn converges to £!« for every fixed set of Xi, xa, • ■ • , x„ and E(^mn) con¬ 
verges to fi(fift) since both and J2„ are bounded by 1. Furthermore if xo 
IS any value such that F'(xo) 0 and yo = F(xo) then FZ^’^Vo) converges to 
E (ya) = Xo. For if Xi is a cluster point of the set FZ^{y^, then there exists, 
for a given e, a sufficiently large m such that | F(xi) — F„(xi) | < e (because 
F m(x) F{x)) while, for the same m, 1 Fm{xi) — y^ \ < e because of the con¬ 
tinuity of F,„(x) Thus I F(xi) — i/o 1 < 2e and, since e is arbitrary, 
1/0 = F(xi) = F{xa). So xi = Xo since F'(xo) 9 ^ 0 Thus FZ^iy) —> F~^{y) for any 



SPACING OF SAMPLE VALUES 


355 


value of y such that if x is mapped into y by F(z) then F'{x) 9 ^ 0. This set 
on the y axis however includes all y except for a set of measure zero and so 
F^iy) F'^iv) almost everywhere. So Kiy) = GF^\y) GF~\y) = k{y) 
almost everywhere and 


1 — fc, 


^ [1 - * (»+ + *(»)] 


almost everywhere. Then 

I’"*' [1 - k. (. + 

-^ 1 . '* [* - K* * 

since both integrands are bounded by 1. Therefore the equality 

E{Q^n) = jf 1^1 - fc„ ^a: + + fcm(a:) J dx 

is preserved as m —> “. 

Now fc(a:) is a monotomc function and hence has a derivative almost every¬ 
where. Then 

l-k(x + 

- [1 - (>^ (* + jil) - / ^T-i)T 

converges to almost everywhere. If we write 
H^ix) = 1^1 - fc (x -b 


when 0 ^ X ^ — - 7 —^ and Hni^) — 0 when —< x ^ 1, then 
n + 1 n -j- i 

i‘ * - 1 '"*' ['-*(» + i 

^ —j. 00 , The curve y = lies always above its tangents and the tangent at 

x=lisu = --x-l--. Thus e'* ^ --x -f- ? for all x, equality holding only 
e 6 e e 

when X = 1, and therefore ^ --&'(x) -b - , equality holding only 

when fc'(x) = 1. 

So ^ 

f dx ^ f k'(x) dx -b ^, 

Jo e Jo e 



356 


B SHERMAN 


equality holding if and only if k'(x) = 1 almost everywhere. But for any mono- 
tonic non-decreasing function 

f k'(x) dx g A:(l) - fe(0), 

Jo 

equality holding if and only if k{x) is absolutely continuous. Hence 

dx -- t k'(x) dx + -k-, 

Jo e Jo c e 

and the equality runs through if and only if k{x) is an absolutely continuous func¬ 
tion such that k'{x) = 1 almost everywhere But this is true of k(x) if and only 
if fc(x) s X and this in turn is true if and only if F{x) = Gix) 

Theorem 4. The random variable has limiting variance zero; i.e., lim E(Q\) = 

dx 

JO 


As before we assume first that F^{x) 9 ^ 0. Then 


E(Ql) = E 
(4.1) + E 




Fix.) - Fix,-d 


n -f- 1 


)] 


Fix.) 


E 


1 


[K 


Fixt) - 


+ 


1 - Fix. 




n 4- 1 

Suppose [Sup x: /c(x) = 0] = a and [Inf x: fc(x) = 1] = b. We may then obtain 

lim E b'(xi)-f]„ in the following manner: 

L| n + 11 J 

1 


Fix,) - 


(4.2) 




n + 1 


— a 


On 


[il(o'„)]‘/l 

But 0„ g 1 so that B(o’„) is bounded as n —> oo. On the other hand 

f - aj (1 - GM)""' dGix,) 

= n f (x — a — —(1 — kix))”~^ dfc(x) 

*^0 \ 71+1/ 

- (« + i^i) +1*2 (»= - « - (1 - *W)' *=■ 



SPACING OP SAMPLE VALUES 


357 


r“ 

As n ^ the expression on the right tends to + / 2 {x — a) dz = 0 

Jo 

Thus the expression on the right of (4.2) goes to zero as n —» and therefore 


( 4 , 3 ) lim E 


1 I ^ / 

E(xi) -0„ = lim E [afl„] = a / 

71 “T M J n-*«« Jo 


(x) 


dx. 


In a similar manner we obtain 


(4.4) 
and 

(4.5) 


lim E 

n-»ee L. 


1 - Fix,:) - 


n + 1 


On 


= (1 - 5) f 
Jo 


-lc'(i) 


dx 




F(xi} - 


n + 1 


+ 


1 - F(x„) 


n-rrlJ] 


= — + 1 — 5)^ 

The first term on the right of (4.1) remains to be investigated. We have 

■ i ® [S ~ 3^) ] 


(4.6) 


+1E ['f t, I fW - f («.-.) - 11 fbJ - rfc-.) - |] 


+ 


1 n 

5® E 
^ L '"S 


Fix,) - Fiz,-i) - 


71 . + 1 


Fix,+i) - Fix.) - ■ 


The joint probability density element of x.-i and x. is 


nl 


it - 2)1 in - ^)l 


(1 - Gix.-i))'-^Gix,)"-' dGix^i) dGix.) 


so that 


-■o<X<r<« 

• [1 - GiY) + GiX)r^ dGiX) dGiY) 

= -nin- 1) t r (y - X -~ 

4 Jo Jo \ 71+1/ 



358 


B. SHERMAN 


In this latter double integral we integrate first with respect to x and then with 
respect to y obtaining 

(’ - n-Tl) (^1 - 

+ ^ JJ [1 — Hy) + ^(a:)]" dx dy, 

0<X<V<1 


and proceeding to the limit 

- dn)’] 

- “U " 5 /. '■*>*+5 // 


dx dy 


(4.7) 


0<I<K<1 
A;(x)—A:(v) 


= - i i (1 - i fj dxdy. 


' 0<*<»<1 

The joint probability density element of x<-i, x,, x/-i, x, when ,/ > i + 1 is 

|1 - OifSfT'(«3b.) rJ(?fe,.i) JOfe), 


SO 


F{x,) - F{x^i) - 


Z L *“2 ;-»+2 

= 1 n(n - l)(n - 2)(7i - 3) fffj | FiY) - F{X) - 


n + 1 


:y) F{xj-i) |] 


(4.8) ■ 


F(y) - Fill) - 


o<z<r<i/<v<i 

[1 -am + o{u) 


V — n — 


n + 1 

- a(Y) + a(x)r-UG(x) dO(Y) dOiu) dam 

= lnin-l){n-2Kn'-3) ffjf ]^y - x - 

0 <a<v<u<«<l 

• [1 — /c(d) + k{u) — k(y) H- fc(a:)]"~'‘ dk{x) dk{y) dk{u) dkiv). 
The joint probability density element of ®,_i, a;,, a!,+i is 
nl 


n -f" 1 


(i — 2) !(n — i — 1)! 


' da{x,-i) da{x,) dG(x^i) 



SPACING OF SAMPLE VALUES 


359 


and so 


■E 


Lr-2 


F(xO - - 


1 


n -j- 1 


F{x^+l) - F{x^) - 


.in(»-l)(»-2) /// |p(I0_TO-_^j 


n -h 1 


(4.9) 


0<X-<1'<7<1 
1 


F(V) - FiY) - I [1 - GiY) + (?(X)]"-' dG{X) dG(Y) dGiV) 

-in(»-l)(«-2) /// 


V - IJ - 

» n + 1 


0 <l<l/<il<l 

[1 — k{v) + k{x)]’'~^ dk{x) dk{y) dkiv) 


We introduce the symbol Sip, q) as follows 


Sip,q) = 


-1 


if g ^ p + 
lf g > p + 


n+ V 
1 


71+1’ 

Then m the integral on the right of (4.8) we perform a partial integration with 
respect to u and add to the integral on the right of (4.9) We get 


^ nin - 1) (n - 2) jjj 


71+1 


y - X- 


0<1<V<»<1 

■ [1 — hiy) + fc(a;)]”-^ dkix) dk{y) dk{v) 
-^niv - Din - 2) Jjjj Siu,v) 


71+1 


y - X- 


n + 1 


0 

■ [1 - kiv).+ kiu) - kiy) + fc(ai)]"“® dkix) dkiy) dkiv) du, 

and now integrating with respect to v in the triple integral and performing par¬ 
tial integrations with respect to x and collecting terms the sum of (4.8) and 
(4.9) becomes 


nin — 1) nin — 1) 

4(71 + 1)2 2(71 + 1) io 


y 


71+1 


11 - kiy)r^ dkiy) - 


jj Six, 7/)[l - kiy) + kix)]'' ^ dxdkiy) + |7i(77 - 1) 


0 <!E<1/<1 


/// 




y - 


71+1 


[1 — fc(t7) + kiv) — kiy)] 


iT1-2 


• dkiy) dkiv) du + ^ 77 ( 7 ^ — 1) 

JJJJ Siu, v)Six, y) [1 - kiv) + kiu) - kiy) + fc(a:)]"“' dkiy) dkiv) dx du. 


0<*<V<«<«<1 



360 


B. SHERMAN 


Now some tedious, although in principle straightforward, calculations show 
that the first three terms of this expression approach 

(4.10) -I - la ~ -h) + f dx, 

Jo 

that the triple integral approaches 

(4.11) la + lail - h) + la^ - a f 

Jo 

and that the quadruple integral approaches 

2 If dx du - £ dx - {I- 6)jf' dx 

0<a:<«<l 

^ // dir du + (1 - &)“ + lh(l - &) + i 

0 <*<tt<l 

Thus collecting the results of (4.3), (4.4), (4.5), (4.7), (4 10), (4.11), and 

(4.12) we have 


lim E{ClV) = 2 

n-*« 

0<»<u<l 

Since the integrand is symmetrical in the variables u and x we may write 

(4.13) lim E{Ql) = f[ dx du = \ f dx ~\, 

n-*oo JJ *^0 _ 

0<I<1 

0 < T .<1 




and this proves the theorem in the case F'{x) 0 

Using the procedure of theorem 3 we may however extend the theorem to 
include the possibility that F'(x) is sometimes zero. But it must be shown 
additionally that the sequence Fm(x) can be so chosen that flmn converges to On 
uniformly in n, i.e. that, for a given e, ] ilmn — On | < e for m sufficiently large 
and for any value of n If this is true then, observing that 0 ^ Omn + On ^ 2, 
1 Ofl,n ~ On I <1 2e and 

1 S(oS.„) - E(Sll) 1 ^ B( 1 oL - O^n I ) ^ 2e 


independently of n. Letting n —> « 


r f e"*” drrT - lim S(0') g 2*, 

|_Jo _[ 

and nowletting m —> <» (the^'m(ai) constructed below are such that ki(x) —> k'(x)) 



lim S(0“) ^ 2t. 

n-^oo 



SPACING OP SAMPLE VALUES 


361 


Since e is arbitrary this implies (4.13), so that the theorem is extended to include 
the possibility that F'{x) is sometimes zero. That the sequence can be 
chosen so that fimn converges to 0, uniformly in n can be shown as follows. The 
set of points on the x axis for which F{x) = 0 maps into a set of points on the 
y axis of measure zero For any m we may enclose this set on the y axis in an 

open set S of measure less than — S is the union of disjoint open intervals 

S,, i = 1) 2, ■ ■ ■ . The sets Tt = F" (S,) on the x axis are disjoint open inter¬ 
vals Now we may construct a distribution function F,„(a:) which coincides 
with F(x) outside 2T., is such that Fm(x} ^ 0, and otherwise satisfies the condi¬ 
tions of the theorem (stated explicitly in Theorem 3). The sequence Fm(a:) con¬ 
verges to F(x). Furthermore 




1 71+1 

(4,14) < 1 y 
“ 2 

fl+1 


F(xd - F(xi_i) - 
F(xi) - F(x^-l) ~ 


1 


n -j- 1 
1 

n+ 1 


■1 Tl+1 


Fm(x,) - F„(x,-i) - 


n + 1 


Fm(x%) ~~ Fm(Xt—l) 


n -f 1 


1 nyi 

^ ^ E 1 lF(x,) - F(x,.0] - iF.ix,) - F„(x,_i)] I 

A 4—1 


For any particular set of values of xi, 22 , • Xn some (po,5sibly none or pos¬ 
sibly all) of the X, will fall into intervals of the IT ,, If this finite set of intervals, 
each containing at least one x,, is say T\ ,Ti, ■ • • , Tk, then a simple analysis 
of the sum on the right of (4.14) shows that it is less than twice the total length 

of the intervals F{T^, F{Ts), ■ • ■ F{Tk) and this total length is less than ~. 

m 

Thus 1 flmn — fin 1 < “ and this result is independent of n 
m 


REFERENCES 

[1] H. CiiAMia, Mathematical Methods of Stahstics, Princeton University Press, 1946 

[2] B F Kimball, "Some basic theorems for developing testa of fit for the case of the non- 

parametrio probability distribution function I," Annols of Math Stat, Vol. 18 
(1947), pp 640-548. 

[3] P. A. P Moean, "The random division of an interval,” Jour. Royal Stat. Soc., Supp , 

Vol. 9 (1947), pp. 92-98. 

[4] N. Smienofp, "Sur la distribution de u*,” Compte Rendus de I’Academe des Sciences, 

Paris, 202 (1932), p 449 

[5] A. Wald and J Wolpowitz, “On a test of whether two samples are from the same 

population,” Annals of Math. Stat., Vol. 11 (1940), pp. 147-162, 

[6] S S, Wilks, Mathematical Statistics, Princeton University Press, 1943. 



ON A PROBLEM IN THE THEORY OF k POPULATIONS^ 

By Raqhxj Raj Bahadur 
University of North Carolina 

1. Summary. In two recent papers, Paulson [1] and Hosteller [2] have called 
attention to several unsolved problems in fc-sample theory. A problem which is 
typical of the ones considered in this paper is as follows. 

Let n, 1 ^ 2 , • • , irt be a set of normal populations, tt, having an unknown 
mean m, and variance Q{x, 6^) being the distribution function which char¬ 
acterizes IT, , Samples of equal size are drawn from each population, a, being 
the sample means, and the estimate of obtained. The problem is to construct 
a suitable decision rule d = d(lX,j; to select one or more populations, the 
object being to minimize the expected value of the random distribution function 

G(x I s{d)) = Z ZM) • G{x, 8,) / E ZM), 

where Zi(d) = 1 if tt, is selected by d, and = 0 otherwise. It is shown that under 
the restriction of impartial decision, the rule dk = “Always select only the popu¬ 
lation corresponding to the greatest J?’,-” cannot be improved, no matter what x 
or the true parameter values may be. It follows (i) that dk is the uniformly best 
decision rule in the class of impartial decision rules for all weight functions of type 

W = max {mi\ — Z > 

and (ii) that the customary F and t tests of analysis of variance are not relevant 
to the problem. 

This result is an application of Theorem 1 which applies to a number of similar 
problems concerning k populations, especially when the populations admit 
sufficient statistics for their parameters. Two examples of statistical applications 
are given in Section 6. 

2. Introduction. It has been recognized for some time that the classical 

theory of statistical inference does not provide direct answers to many problems 
which are of great interest in the applications. One of them, which arises in 
the theory 6f samples from k populations, is what Hosteller has called “the 
problem of the greatest one.” The word “population” is used here for a process, 
T{d) say, which generates independent random variables Xi, X 2 , , each X 

having the same distnbution function P{X < x) = G{x, 9) say, and a set of X’b 

* This paper is based on a thesis submitted to the Department of Mathematical Stat¬ 
istics, University of North Carolina, in partial fulfilment of the requirements for the 
Ph D. degree This work was sponsored by the Office of Naval Research, 

m2 



PHOELEM OP k POP'DLA.TIONS 


363 


which, have been generated by t is called a sample from the population We shall 
describe the problem, as also the formulation adopted in the following section, 
in terms of two special cases. These cases occur when the h given populations 
vi, nvi, ’' ■ , Tk are such that tt, is characterized by the distribution function 


G(x, e,) 



(bj, c,), Cl 0, i 


1, 2, • ■ • , fc, where h{x) is an 


absolutely continuous non-decreasing function with h(— oo) = 0, h(+co) = i. 
Such sets of populations appear frequently in statistical theory and practice, a 
given set of normal, or rectangular, or gamma type populations being familiar 
instances. 

Case 1. Let Xt,, j = 1,2, • • • , n be a sample from the population ir,, f = 
1, 2, • • • , A: where t,' is characterized by the distribution function h 



bi being unknown, and suppose that the statistician is asked to select the popu¬ 
lation which he thinks has the greatest b,, but is allowed to select more than 
one population if (as a consequence, say, of “insignificant” outcomes of tests of 
differences between populations) he does not feel confident enough to select only 
one This situation will occur if, for example, the X./s are observed yields in an 
agricultural experiment in which each of k varieties has been replaced n times, 
the yield with variety ir, being normally distributed with unknown mean m, and 
variance cr^, and the statistician is asked to recommend one or more varieties 
for general use. (Cf. Example 1 in Section 6.) 

Case 2. Suppose now that the X,/s are samples from populations x, char¬ 


acterized by distribution functions h' 




> 0 unknown, i = 1, 2, 


k, 


and the statistician is asked to select the population which he thinks has the 
greatest 1/c,, but is allowed to select more than one population.'^ This situation 
will occur if, for instance, the x, are factories producing an article having a numeri¬ 


cal quality characteristic X, h 



being the distribution function of X in the 


product of Xi , and the statistician is required to assign production to one or 
more factories, the object being to obtain product of stable quality, b being the 
standard characteristic 

It is clear that the usual statistical theory, which confines itself to estimation 
of parameters di and testing of hypotheses of the kind i?i)(b, = constant), is 
inadequate to deal with problems of this sort, where a definite course of action is 
required of the statistician. It is hardly necessary to add that selection is an im¬ 
portant problem in the applications, and the testing of hypotheses is often an 
indirect attempt to justify selection. In accordance with Wald’s formulation of 


* There is no essential difference between the problem of the greatest one and the problem 
of the least one. In order to avoid trivial complications, the terminology of the former will 
be used wherever possible. 



364 


RAGHU RAJ BAHADUR 


the problem of statistical inference,“ we proceed to consider explicitly the purpose 
of selection and the “loss” involved in making any particular selection. 

3. A class of weight functions. Let xi, tts , • ■ • , be a given set of popula¬ 
tions, X, being characterized by the distribution function G(x, 6^), and let us 
denote any particular selection, say s, by indicator variables 2i, 22 , ■ ■ ■ ,Zk where 
2 , = 1 if is selected and = 0 otherwise Since any meaningful selection must 
concern itself with the random variables generated by the populations selected, 
consider the function G{x \ s) = X)t=i ^<.Gi^, 2 :. (t(x | s) is a distribu¬ 

tion function, and provides a logical and direct overall picture of the effect of 
making the selection s, since no distinction is made between the populations 
selected. In immediate generalization, we define a “selection” s to be a vector, 
s = (Pi) P 2 , , Pk) with p, > 0, p. = 1, and put Gix \ s) = 

TpiGix, dr). Roughly speaking, Gix ] s) is the distribution function which charac¬ 
terizes the mixed population obtained if sampling rates pi , P 2 , ■ • , Vk are 

assigned to xi, xo , • • ■ , xa respectively, p, = 0 corresponding to rejection of 
xr. Henceforth, a selection vector will be called a decision. 

Now, if each of the Gix, 6r)'s were known, an appropriate decision s could be 
chosen without resort to sampling. If not, the statistician must construct (in 
advance) and use an s-valued function of the sample values. Such a function, 
say d, IS called a statistical decision function or decision rule The decision s 
according to d, say s(d) = (pi(d), Pi{d), • • , Pkid)), is in general a random 
vector, so that for any fixed x, G{x j s(d)) is a random variable. Consider the 
distribution function Hix \ d) = E[Gix \ s(d))] = Gix, 9,)E[prid)], where 
E denotes the expectation operator. It represents the average overall effect of 
using the decision rule d, and so affords a reasonable description of the perform¬ 
ance of d Clearly, the problem is to construct d in such a way that Hix \ d) has 
desirable properties 

The “desirable properties” will depend, of course, on the particular problem 
being considered Returning to our two cases, denote the arbitrary but given 
set of all possible parameter points oj = {di , 82 dk) hy fl, and let D 

be a given class of decision rules d = d({X,j}). Then, m Case 1 we wish to 
choose d* e D such that Hix | d*) = inf H(j: | d) for every x and every oj e fi. In 

Case 2, we wish to choose d* so that for every x and every u we have 
Hix I d*) = inf H(x 1 d) whenever x< b, and = sup Hix\d) whenever x> b 

d*D dtD 

These requirements are very strong, and in general no such d* will exist without 
heavy restrictions on fl and on H. (Cf. however the corollary to Theorem 1. It 
will be found that in a number of cases no restrictions on are required provided 
that Z) IS the class defined there.) For some purposes, it may be sufficient to 
consider functionals oi Hix \ d). The functionals which are most useful in the 
applications are the moments. Thus, one may wish to find d* such that aid*) = 

gix)dHix 1 d), gix) being some appropriate function. 

CO 


* See, for example, [3], Chapter VI. 



PEOBLEM OP k POPULATIONS 


365 


For example, in Case 1 we may take g{x) = x. Then ci{d) is the mean of a random 
variable having H(x\ d) for its distribution function, and constructing a suitable 
d to maximize a{d) is “the problem of the greatest mean.” Again, in Case 2 we 
may take ^(a:) = — (x— hf, and in that case maximizing aid) would be “the 
problem of the smallest variance 

In terms of mixtures of distributions, Hix\d) is the mixture of \ s) with 
respect to 5, where S is the probability measure induced by the decision rule d 
on the class of Borel sets in the space of all possible decisions s. It follows by 
the use of Theorem 5 m [4], or otherwise directly, that maximizing aid) is equiva- 

lent to maximizing the expected value (6) of X P. / gix)dGix, 0,). Writing 

«/—06 

/•+* 

g,= gix)dOix, 0,), one may say that the object is to construct d in such 

J—CO 

a way that the expected value (S) of the “weight function” 

h 

Wioi, s) = max 

i t—1 

is minimized for every w. W represents the “loss” incurred by choosing the de¬ 
cision s when the true parameter point is o> It will be seen that W defined accord¬ 
ing to (A) in Section 5 includes essentially all weight functions which are likely 
to be of interest in the type of problem considered in this paper. 

We have so far not emphasized the obvious fact that the probability measure 5 
which is induced by d on the space of decisions will in general depend on the 
unlcnown parameter point co. Therefore, the expected value (6) of W is to be 
written as E[W(a), s(d)) 1 w] = rid 1 w) say. FoUowing the usual terminology, we 
shall call rid | w) the risk function of the rule d, and shall say that d* e D is the 
uniformly best rule in the class D if rid^ 1 w) = inf rid j oi) for all to e 12 


4. A class of decision rules. The class of decision rules to which we shall 
confine ourself is rather limited, and may be described as follows, with reference 

to the previous sections: • i o I- 

(i) Given independent random variables {X,, ),J-- 1,2, ■■ ~ > > i 

from the k populations tt, , let 

X, = <^.(Xa , X .2 , • • • , X„), i = 1, 2, • ■ , fc and 7 = H{Xi,}), 

where Xi, Xj , ■ ■ ■ , X^ ; 7 is an independent set, and the X.’s have fre¬ 
quency functions The choice of and i/' will depend upon particular cases: in 
Case 1, Xi, • • • , Xi ; 7 will be statistics relevant to the estimation of 


^ An unpublished theorem of Herbert Bobbins insures that if a d* 
requirements of the preceding paragraph, it will also maximize all functionals o.(d) cor¬ 
responding to such functions g(x) 



366 


RAGHTJ KAJ BAHADUR 


hi , bz hk ; c respectively, and in Case 2 they will be relevant to 

T 6 ' 

Cl f C2 j ‘ ' > Cfc ) 0, 

(ii) Given the statistics [X,); Y, i/') is the class of all impartial decision 
rules which are based on them. A decision rule d = d((X,i; Y) is said to be 
impartial if it has the following structure. Let X(o < X{ 2 ) < • • ■ < X^k) 
be the ordered X,’s Then d defines non-negative random variables , 

X( 2 ) , • , X(i) ; Y), 1,2, ■ ,k such that = L and is the 

proportion p{d) which is assigned by d to the ir corresponding to Xi,) . We 
use the term “impartial” for such decision rules because they determine the 
proportions [Xi, Xj, ■ ■ • , Xt] without regard to which X belongs to which 
population, and then assign these proportions in strict order of the X,’s 
We shall specify the intuitively plausible class of impartial decision rules for the 
important normal cases, and give a few instances of such rules. 

Suppose first that the X,/s are from normal populations having means m. 
and a common variance <r^, and that we are interested in the problem of the 
greatest mean. D is then the class of all impartial decision rules which are based 
on the statistics 

X. = X. = i: X.,/n, i = l,2, 

F = 5^ = i: Z (X„ - X,.)V/c in ~ 1). 

i-1 i-l 

The numerical factors are of no importance, and may be omitted (Of. footnote 4. 
See also Example 2 in Section 6, where such factors have been omitted for con¬ 
venience). A rather simple member of D is the rule [Xi-is 1/3, \k = 2/3] i.e. 
“Always assign the proportion 2/3 to the population which has the greatest 
X, , and the proportion 1/3 to the population with the second greatest.” In using 
this rule although the Xj’s remain constant from sample to sample, the decision 
s{d) is a random vector In general, however, the X/s will themselves be random 
variables. This is the case if, for instance, one insists on utilising the standard 
test of differences between populations, and uses the impartial rule “Perform the 
F test of Haim, = constant) at the five per cent level. If Ho is rejected, assign 
the proportion 1 to the population which has the greatest X, . If not, assign 
equal proportions to all populations for which X, > Z^-i X</fc, and zero propor¬ 
tions to the rest.” Another type of impartial decision rule according to which the 
X,’s are random variables will be described at the end of Example 1 in the next 
section. Now, it is (intuitively) clear that if the sample size n is indefinitely 
large, the rule [X* = 1], i.e., “Always assign the proportion 1 to the population 

' It is unnecessary to specify here the exact relation between the statistics and the 
parameters. (a) the definition of the parameter which determines a distribution function 
Gr(i, 0) is more or less arbitrary, e.g., instead of writing B = (b, c) we may write 6 = 
(6Vc, cosh c), and (b) D(<i!>i, ^i) = D(<t> 2 ; provided that <^2 = /(<^i), = ghf/i), where 

/(z), g{x) are strictly monotonio functions. It will be seen that Theorem I is invariant under 
such transformations of parameters and/or of statistics. 



PROBLEM OP k POPULATIONS 


367 


•with the greatest cannot be impro'ved, no matter ■what the tine parameter 
values may be. Our mam result (Theorem 1 ) asserts that the statement is in 
fact valid for any n, provided that one restricts oneself to the class of impartial 
decision rules 

In a similar way, if the Xtj’s are from normal populations having a common 
mean m and variances a\ , D ivould be the class of all impartial decision rules 
which are based on the statistics 

X, = = E (X,, - Xy/n - 1 , f = 1 , 2 , ■ ■ • , fc, 

X = E E xjkn, 

and analogous remarks will apply to this case. 

It should be observed that in a given case the appropriate statistics (X»}; X 
may not be as obvious as in the case of populations like the normal which admit 
sufficient statistics for their parameters This real difficulty is not to be confused 
■with the ambiguities mentioned in footnote 4. Furthermore, given the X.'s 
there may not exist F = ^((X„]) which is independent of the X,’s; we shall then 
assume, without invalidating our result, that the parameter which Y is supposed 
to estimate is known. Theorem 1 becomes operative only after such questions 
have been resolved. 

6 . The uniformly best decision rule. It is convenient to define here some 
terms which will be used subsequently ■without further explanation. All functions 
are assumed to be Borel measurable. Sets will be denoted by curly brackets: thus 
{/ = c) is the set on which / = c holds, and (a,) is the set of all o, in question. 
“Measure” -will refer to ordinary Lebesgue measure in the xy plane. 

Definition 1 . Given fc independent random variables X,, t = 1, 2, • ■ • , fc, 
such that each X has a frequency function, let X(,) , j = 1 , 2, • • ■ , k, be the 
ordered set, X(j) being the jth X^ in ascending order of magnitude. Then = 
{X, = X(j)}, and is the characteristic function of the set , that is, a„ = 1 
for any point of A,, and = 0 elsewhere. 

Since the X,’s have a joint distribution which is absolutely continuous, the 
sets Ai, are well defined with probability one. Clearly, we have E*-i = 1 
for every j andEj-i = 1 for every with probability one. 

Definition 2 . Let |3 = (hi, 62 , • • • , h*) be a vector of real numbers hi , 
and <i> — (Ji ,fi , • ■ j /fc) a vector of real-valued functions/.(a:) defined for every 
real x. We shall say that « Tip) if for any r, s = 1 , 2, • ■ ■ ,k for which hr ^ h,, 
the set {Mx)f,iy) < My) f,{x), x < yj ia of measure zero. 

We require the following 

Lemma. Suppose that Xi, X 2 , ■ • • , X* ; F are independent random variables, 
Xi having a frequency function f,ix) and that 4> = (/i, /a )■••)/*)« X(/3), where 
/9 = (61 , 62 , • • ■ , bk) with 

( 1 ) hi < bj < • ■ • < b* . 



368 


HAGHU RAJ BAHADUR 


Then, for any non-negative random variable X = X(Xn), Z( 2 ), ■ ■ , X^k) ;Y) and 
any p,q,m= 1,2, • • ■ ,k with p < q, we have 

(2) i: E{\a,^) < E E(Xa„). 

t—m t«rm 

Proof. Since (2) holds trivially if p = g or if m = 1 suppose that p < q 


and m > 2. Writing B(m, j) = sE = l}" = E 4,j , (2) is equivalent to 

[_t=Tn 

/ 'K dP > i X dP, and hence to 


(3) 


/. 


\dP 


B{m,g) B' (m, p) 


>-l 


X dP, 


fl' (m,g) 


where B' denotes the complement of B, and P the probability measure in (xi , 
M , ■ ■ , xic , y) space. 

For any permutation iiii • • - 4 of 123 • - k, define 0(iit2 • ■ %) = 

- A.jjt Clearly, the O’s corresponding to different permutations are 
disjoint and each of the sets B(m, q)B'(m, p) and B(m, p)B'(jn, q) is the set- 
theoretic sum of certain O’s. Now, it is easy to see that 


(4) 


0 a B{m, q)B'{m, p) 
0* C B(m, p)B'(jn, q) 


if and only if 
if and only if 


fj, = 1, or 2, • • • , or m — 1, and 

i, = m, or m. -p 1, ■ ‘ , or fc. 

i* = m, or m -|- 1, • • •, or k, and 

z* = 1, or 2, • • •, ox m — 1. 


Hence a one-one correspondence between subsets 0{ii • • ■ %) of B{m, q)B'{m, p) 
and subsets 0* = 0(zj' ■ • ■ i*) of B{m, p)B'{m, q) exists through interchange 
of the pth and gth elements of the defining permutations, the other elements 
remaining the same. It will be suflScient to prove that if 0 and 0* are any pair of 
corresponding subsets, the integral of X over 0 is greater than or equal to its 
integral over 0*, for then (3) will follow by addition. 

It is clear that for any 0, 



where R is the domain {h < tj < ■ • • < t*} and F(,y) is the distribution function 
of Y. Let 0 and 0* be any pair of corresponding subsets. It follows from (5) 
that 

/xdP- / \dP = ( Q[J[ /,,(«,)] n dtr dF{y), 

•0 Jo* Ja rpip, Q r —1 



PROBLEM OF k POPULATIONS 


369 


where 

(6) Q = X(ii - /v,(ip)/,,(^,)]. 

From (4) and (1) we have bi^ < 6,^ . Since p < q implies that over R, 

and (p e r(/3), it follows that the expression m square brackets in (6) is (except 
perhaps for a set of measure zero) non-negative over R Since X is also non-nega¬ 
tive, it follows that Q is non-negative over R, and the Lemma is proved. 

We shall now state and prove the main result. Note that the statistic F is not 
necessarily real-valued. 

Theorem 1 . Suppose that 

(A). Qisa given set of points w = {Oi, 62, ■ • , di). |3(a)) = (61, 62, • ■ • , bt) 
and 7(w) = igi , Qi , ■ ■ ■ , gu) are defined for every w such that b, < bg 
implies g,, < Qqfor every p,q = 1,2, ■ ■ ■ ,k. 

k 

Given an s — (pi , pa , ■ • • ,Pk) with p^ > 0 and 22 Pi = 1> 

1^1 

k 

Wiei, s) = max - 22 PrOi- 

1 t*=l 


(B) . Zi , Xj , , Xk \Y are independent random variables, each X, having 

a frequency function f{x, 6 ,) = f,{x) say, and <i>{o}) = {fi, fi , • • • , fk). 

(C) . D IS the class of all decision rules d such that 

k 

d = d(X(i) , X( 2 ) , • • , X(k) ',Y)= [Xi, Xj, • • • , Xfc], Xj > 0, ^ X; = 1, and s{d) 

3-1 

k 

- (Pi(d), piid), ■ • > Pfc(d)) where p,(d) = 22 X, a., , i = 1, 2, • • • ,k. 

Given d e D, rid 1 w) = JS[W(w, s(d)) ] cd]. 

(D) . For every 03 , (p e r(/3) ® 

Then, for every to, r(di 1 w) = sup rid ] cj) and ridk j u) = inf r(d J to), where 

deD diD 

di ^ [1, 0, 0, • • ■ , 0] and d* = [0, 0, • • • , 0,1], 

Corollary. Suppose that Ti , z = 1, 2, • • • , fc ore populations characterized 

, Cl > 0. For any fixed x, let 

Gix 1 CO, s) =22 PiGiz, 6,), and Hix | d, to) = E[Gix j to, s(d)) j to], 

1-1 

Case 1. If for every to, ( 1 ) ci = C 2 = • • • = c*, 

(ii) <p E r(i3), where |8 = (bi, bi , ■ ■ • , b*), 

then, for every to, 

Hix\dk,<3>) = inf Hix\d, to). 

dcZ> 

Cas^ 2. If for every 03, ( 1 ) bi = bj = • • ■ = bi = b(co), say, 

(ii) <p e Tifi), where |3 = (ci, C 2 , • ■ , Ck), 

• Note that 0 e T{—g) is equivalent to it>*€T(fl), where </>* = (f* , f* , • ■ • /J), and/J 13 
the frequency function of Xt = — X.. 


by distribution functions Gix 


, e.) = M 



370 


EA.GHU BAJ BAHADUR 


then, for every to, 


J? (a: 1 di, to) 


mf H(x I d, to) if X < 6(to), 

diD 

sup n(x I d, to) if a; > &(to). 

dtD 


Proof, Choose and fix an arbitrary to e i2. Without loss of generality we may 
assume the notation to be so chosen (by simultaneous intei changes of indices i 
in each of (^i), {b.}, {j,}, {p,}, {X»}, {/,}, and \a„],j= 1,2, ,k) that (1) 

holds. It then follows that gi< < • • • < gk and we write 


(7) + 111 + ^,2 + • • • + h,, b, > 0, f = 1, 2, • • • ,k. 

Choose and fix an arbitrary member of the class of impartial decision rules, 
say d = [Xi , Xa, • • ■ , Xi]. We have 

k 

(8) r(d|to) = max \g^] — g^'E{\,a„). 

« 1, j—1 

Now 


( 9 ) 

Since \j 

( 10 ) 


+ ' * • + hi)E()^j‘ayj) 


t.j-i 




k r k 

~ Qi "f" hpi, 


X,(X(i) , X(i) , • • , X(*,) ; y) > 0, it follows from the Lemma that 

k k 

fi(X, a,j) < 52 S(Xyo.i,) for every m and every j, 

»—7l1 l»«W 


by writing X = X, , p = j, and q = kin (2) By using (7), (9) and (10) it follows 
that 

52 9 '.^'(Xja„) < ffi + 52 [”22 ■®(Xja,t)'l bm 

(11) = + E E KE(a,k) 

k 

= E g,E{a,k)- 

Therefore, by (8) and (11), 

( 12 ) r(d 1 £ 0 ) > max ( 5 ,) — 52 g,E{a^k) = r{dh \ w), 

1 »—l 

by definition of di. The inequality r{d\ w) < r(di ) oj) follows from (8) and (9) 
by a similar use of the Lemma. Since both dtD and co « 0 are arbitrary, this 
completes the proof of Theorem 1. 



PROBLEM OP k POPULATIONS 


371 


The verification of the corollary is as follows Choose and fix an arbitrary x 
and write h ^ 

Case 1. Let 7 (w) = (1 — , 1 — <2 , • • , 1 — 4) Then r(d|w) = 
H{x I cZ, w) — mini (Z,), and it follows from the Theorem that H{x\di,o)) = 
supdeD H{x 1 d, w) and H{x\ dk ,u) = inf^.o H{x\d,u), for all u. 

f(Zi, i2, • • ‘ ,tk) if h{u) > X, 

1(1 — Zi, 1 — Z 2 , • • • , 1 — it) otherwise. 

fmax{Z,} — H(x 1 d, co) if i(ui) > x, 


Case 2. Let 


Then we have 


so that 


7 ( 0 )) = 

rid 1 w) = 


Hix 1 d, £o) — min [i, 
i 

[mf H(x I d, w) 

ld«D 

Hix 1 di, 6)) - { I 

deD 


otherwise, 
if 6(u) > X, 
otherwise, 


and conversely for H(i [ djj , co), for all w. 

The preceding proofs suggest that perhaps (D) is not a necessary condition, 
but the following theorem for the case of two populations shows that it is indis¬ 
pensable if Theorem 1 is to hold in general. 

Theorem 2. Suppose that (4), (J5), and (0) hold with fc = 2 and 61 , 62 real¬ 
valued, that the set SI of points x - idi, ef) ts denumerable, that ^(w) = to, that 
Oi 5^ 92 for any os, and that Y is a fixed constant. Let yios) = min, {6,}, vies) = 
maxv {fl.j, and defining the sets 

72(a)) = {fih , m) m , v) < fik , y) fik ,u), ti<k}, 

Sios) - {/(<!, Ii)fit 2 , v) > fik , y)f(k , ft)) k<k\ 


in the k, U-plane, put 

R*ik , Zj) = S 72 (w), 

S*ih , < 2 ) = Z Sios). 

a 

Then a uniformly best decision rule in the class D exists if and only if the set R S 
is of measure zero. Subject to existence, the uniformly best rule, say d*, may be 
defined as 

d* = 

\[0,1] otherwise. 

The proof is quite simple, and will not be given. It is clear that under the 
hypotheses of this theorem, the conclusion of Theorem 1 is valid if and only if 
the set R* is of measure zero, that is, if and only if condition (D) holds. 



372 


RA-GHU RA.J BAHADUR 


6 . Examples and discussion. We begin with two applications of Theorem 1 . 

Example 1 Suppose that grain is to be raised on a given area, say A, of land. 
h varieties, tt] , 7r2, ■ • • , 7^^ say, are available, the yields per unit area being 
nornaally distributed with unknown means m,' and a common variance o-*, also 
unknown. A preliminary field experiment (in which n plots of unit area were 
assigned to each variety) has been carried out, and {A'lj}, j = 1, 2, • , n; 

i = 1,2, ■ ■ ■ ,kia the set of independent plot-yields obtained. The statistician 
IS asked to suggest how the available land should be divided between the k 
varieties, the object being to make the total expected yield as large as possible.’ 

Suppose that an area Ajp^ is assigned to i = 1,2, ■ , /c, with Pt = 1- 

Then the expected total yield is Apinii . Our object is to choose the set 

(Pi j Pa) ■ • ■ ) Pa-) = s so as to minimize the “loss” 

k 

W(u, s) = max {Am,} — ^ Am^p,. 

I 1-1 

Since the m,’s are unknown, one must construct an appropriate s-valued func¬ 
tion of the Xt/a, say d, and set s{d) = d({X„}). The expected “loss” in using 
this procedure is given by E[{w, s(d)) \ w] = r{d \ w), and the problem is to con- 
sti-uct a d which makes r{d \ u) as small as possible, (See (A) and (C). Here 
we have set 6, = (m,- ,o-),w = ( 6 i ,02 , • ,0*), /3(w) = (m 2 ,ma , • • • ,mk) and 

7 (u) = (Ami, Ama, ■ • • , AmO). 

Let Xi = 'E Xi,/n, z = 1, 2, • • • , fc and S' = EE - X^f/Hn - D- 

2-1 .-1 j-i 

Since li , Xa, • ■ , Xi , ^ is a set of sufficient statistics, it is easy to see 
by taking conditional expectations that corresponding to any decision rule based 
on the X.j’s, there exists one defined in terms of the X,’s and S' alone such that 
the risk functions r of the two are identically equal for all possible values of 
the unknown parameters. Clearly, one may confine oneself to decision rules of 
the type d = s({ X,); S^). Now, the frequency function of X, is /i(a;) = 
(n/ 2 iro-')’’'. exp[—n(a: — m,)'/ 2 <r'], and it is readily seen that rrir < m, and 
X < y imply fr{x)J,{y) > f,{x)fr{y) It follows that in the class of all impartial 
procedures which are based on (X,j, S', the uniformly best procedure is to assign 
the whole area A to the variety with the greatest observed yield, (Note that 
by the corollary to Theorem 1, a much stronger result than the one required 
here holds. Cf. footnote 3 ) 

Although Paulson did not set up a weight function in his discussion of the 
selection problem for the present case of samples of equal size from k normal 
populations having unknown means and a common variance, also unknown, he 

’ A double expeetation la involved the expected conaequence of a given decision, and 
the expected decision in using a.particular decision rule. The argument given is justified 
since it is assumed that the random variables generated by the tt’s subsequent to decision 
are independent of the random variables on which decision is based, Cf. Section 3. This 
remark applies to Example 2 also 



PROBLEM OF k POPULATIONS 


373 


gave a class {dc} of decision niles and evaluated some probabilities (P(Gi) and 
P*- [l]j PP- 9b~97) which suggest that some of the applications he had in mind 
are similar to the one given here. In our notation, the rule dc is defined as follows 
for any given c > 0 . 


dc = [>u , Xa, • ■ • , Xtl. where 



j = 1 , 2 , 


,fc 


with 




1 if Xik) — cCiSZ-v/ri) < < Xn) , 

0 otherwise. 


Example 2, Suppose that a manufactured article has a numerical characteristic 
X, and a given article is “defective” if it has an a; < a and “acceptable” otherwise, 
where a is some constant. A consumer requires a large number (JV) of articles, 
which can be supplied by each one of h manufacturers ir, , f = 1, 2, ■ • ,k. The 
characteristic (say length) of articles produced by ir, is known to have a rectangu¬ 
lar distribution with range from 6 to 6 -f c,, but the Ci’s are not known. As a 
preliminary step, the consumer has obtained samples of v articles from each 
manufacturer, and finds the corresponding lengths to be Xtj, J = 1, 2, 
1 = 1 , 2 , • • , /c. The statistician is asked to suggest how the consumer should 
order a total of N articles from the h manufacturers. 

If a < b, the number of defective articles received by the consumer will be 
zero no matter how the order is placed Suppose therefore that a> b. Then, if 
Wt articles are ordered from with n, = N, the expected number of 
defectives equals N (n./iV)-g,, where = ff(c,) and g{t) is given by 


g(.t) = 


[a if < > o - t. 


otherwise. 


Writing ^(oj) = (ci , cj , • • • , c*), yfto) = (ffi , £lz , ■ • • , Uk), it is clear that the 
expected number of defectives is of the form W(w, s) -f h{u), where h (to) is 
independent of s = (ni/N, 112 /N, ■ ■ • , ni/N), and W is defined as in (A) 

We have now to consider what statistics X, should be used to construct decision 
rules. Evidently, we are concerned with a “problem of the greatest c, ” 

(a) . Assuming r > 1, let X, = max, [X,,] - minj {X„}. Since the fre¬ 
quency function of X, is /,■ (x) = v{v — l)c, "(c, — x)x'' ^ iiO < x < c^ and zero 
elsewhere, it is a simple matter to show that Cr < c„ x < y imply fr(x)fa(y) > 
f>i^)friy)- It follows that in the class of all impartial rules which are based on 
the sample ranges, the uniformly best rule is to order all the N articles from the 
manufacturer with the greatest sample range. 

(b) . It may be objected that since the lower end points of all the distributions 
are the same, the use of sample ranges to construct decision rules is not particu¬ 
larly appropriate. Suppose therefore that one takes the statistics X, = 
max, (X,,) — 5 . The frequency function of X* is/f(a:) = vct''x’' ^forO < x < c, 
and = 0 elsewhere, and as before, condition (D) holds. Hence the uniformly 



374 


EAQHU RA.J BAHADUR 


beat impartial procedure in this class is to order all the N articles from the 
manufacturer who supplied the article with the greatest length in the whole 
sample of kv articles 

It IS important to observe that the uniformly best procedures according to 
(a) and (b) are not identical, and choosing between them is outside the scope of 
Theorem 1. Note also that the statistics X* are sufficient for the Ci’s. Therefore, 
corresponding to any decision rule there exists a decision rule which is defined in 
terms of the XT's and has the same risk function. In particular, there exists a 
decision rule m class (b) which is equivalent to the uniformly best impartial rule 
in class (a). It would be interesting to know whether this equivalent rule is also 
an impartial one 

The two examples given above are purely illustrative, and the reader will 
readily construct others in which the statistician is faced with similar problems of 
decision. The second example does not, strictly speaking, belong to Case 2, and 
the reader is urged to consider some specific instances of this Case. There are 
various modifications of “the problem of the greatest one” which may be indi¬ 
cated here very briefly. These modifications are introduced by placing restrictions 
on the class of possible decisions. For example, in Example 1 the statistician may 
be required to select two or more varieties, and to assign proportions of the land 
to the varieties which he selects in such a way that no variety takes more than 
two-thirds of the available land. In that case, the uniformly best procedure (in 
the class of all impartial procedures which are based on the Xi's and S^) would 
be to assign two-thirds of the land to the variety with the greatest observed mean 
yield, and the remainder to the variety with the next greatest. The proof is a 
slight elaboration of the proof of Theorem 1 and is left to the reader. Again, in 
Example 2 the consumer may wish to obtain all the articles which he requires from 
some one manufacturer. In that case, assuming that an impartial selection rule 
based on the Nf’s is to be used, it follows trivially from the case considered 
previously that the uniformly best procedure is to select the manufacturer with 
the greatest Xt . This is intuitively obvious, but the obvious requires proof (i.e. 
verification of (O)), as may be seen by turning to Example 3. 

The intuitive notion referred to above is one which is employed quite fre¬ 
quently in practice. It may be described as follows. Let Xi and Xi be independent 
and similar estimates of unknown parameters Wi and mj , and suppose that in a 
given instance we have Xi > Xi . “Then it is more reasonable to suppose that 
mi > m-i than to suppose that mi < wij.” Theorem 2 shows that this notion 
IS well-founded if and only if condition (D) is satisfied, with /3 = (mi , Ws). The 
condition states essentially that “the likelihood of the greater estimate corre¬ 
sponding to the greater parameter is always > the likelihood of the contrary 
event,” and it should be observed that Xi , X^ being “good” estimates (e.g. 
maximum likelihood estimates) does not ensure that this will be the case. The 
following application of Theorem 2 is an illustration of these remarks. 

Example 3. Suppose that , i = 1, 2 are Cauchy-type populations having 
medians m.', and that the set £2 of possible points u = (mi, mj) consists of just 



PROBLEM OF k POPULATIONS 


375 


the two points wi = (1, —1) and doj = (—1,1). andXs are single observations 
from the two populations, and the statistician is required to decide which 
population has the greater median. 

Here it would be reasonable for the statistician to use a decision rule, say d*, 
■which minimizes r(d | to) = P(incorrect decision ] w, d), where ‘Vi has the 
greater median” and ‘V2 has the greater median” are the two possible decisions. 
That this risk function is included in the scheme described by (A) and (C) may 
be seen as follows. Let the only admissible values of s be ( 1 , 0 ) and ( 0 , 1), cor¬ 
responding to the decisions “mi > mj” and “mi < m2” respectively, and setting 
/3(a)) = (mi , m2), define 7 (wi) = (1, 0 ), 7 (0)2) = ( 0 , 1 ). Then for any d such that 
s{d) equals ( 1 , 0 ) or ( 0 , 1 ) only, the expected value of W is for either u the 
probability of error in using the rule d 

Now, if d = d(X(i) , Xti)) = [Xi, Xa] is any impartial decision rule, it -will equal 
either [1, 0] or [0, 1], corresponding to the decisions “the population with the 
greater X has the smaller median” and “the population with the greater X has 
the greater median” respectively. Since the frequency function of Xi is ft{x) = 
l/7r[l {x — m.)]^, a little calculation shows that in the class of impartial 
decision rules a uniformly best one exists, and is given by 


d* = 


[ 1 , 0 ] 

[ 0 , 1 ] 


if .^(i)-X^(2) 2, 

otherwise. 


In conclusion, we remind the reader that although the weight function W 
defined according to (A) is general enough to include all problems of the type 
considered in this paper, the sampling scheme as also the class of decision rules 
to which our results apply is very limited. We have (i) assumed that the samples 
from the k populations are all of the same size, and (ii) given no objective criterion 
for choosing appropriate statistics, and no justification for the use of impartial 
decision rules based on these “appropriate statistics.” In 'view of the applications, 
it would be of interest to extend the general argument of this paper to the 
numerous situations where Theorem 1 does not apply or is otherwise unsuitable. 

The problem of selection was suggested to the author by Professor Hotelling. 
The author would like to acknowledge his indebtedness also to Professor Robbins. 
This paper could not have been written without his constant encouragement and 
helpful suggestions. 


REFERENCES 

[1] Edwabd Paulson, “A multiple decision procedure for certain problems in analysis of 

variance,” Annals Math. Slat,, Vol. 20 (1949), pp 95-98. 

[2] Fbudbbioe Mostbllbr, “A k-sample slippage test for an extreme population,” Annals 

Math Stat , Vol 19 (1948), pp 58-65 

[3] Abbaham Wald, On the Principles of Statistical Inference, Notre Dame Mathematical 

Lectures, No. L (194:2), Notre Dame, Indiana. 

[4] Hbhbbbt Robbins, “Mixture of distributions,” Annals Math Stat,, Vol. 19 (1948), 

pp. 360-369 ' 



COMPLETENESS IN THE SEQUENTIAL CASE 

By E. L. Lehmann and Chablbs Stein 
University of California, Berkeley 

1. Summary. Eecently, in a series of papers, Girshick, Hosteller, Savage and 
Wolfowitz have considered the uniqueness of unbiased estimates depending only 
on an appropriate sufficient statistic for sequential sampling schemes of binomial 
variables. A complete solution was obtained under the restriction to bounded 
estimates. This work, which has immediate consequences with respect to the 
existence of unbiased estimates with uniformly minimum variance, is extended 
here in two directions. A general necessary condition for uniqueness is found, 
and this is applied to obtain a complete solution of the uniqueness problem when 
the random variables have a Poisson or rectangular distribution. Necessary 
and sufficient conditions are also found in the binomial case without the restric¬ 
tion to bounded estimates. This permits the statement of a somewhat stronger 
optimum property for the estimates, and is applicable to the estimation of 
unbounded functions of the unknown probability 


2. Introduction. The notions of completeness and bounded completeness of 
a family of distributions were introduced in [1, 2] in connection with the prob¬ 
lems of similar regions and unbiased estimation. The question of whether either 
of these two properties pertains to various families of distributions that are of 
interest in statistics was discussed in [2] under the assumption of fixed sample 
size. The only sequential problems of this kind that have been treated in the 
literature (with quite different terminology) refer to the binomial case. For 
this case Girshick, Hosteller and Savage [3] found necessary (and also certain 
sufficient) conditions on the sequential sampling scheme for completeness, while 
Wolfowitz [4] and Savage [5] gave necessary and sufficient conditions for bounded 
completeness. 

If r is a random variable distributed over an additive class of sets in some 
space according to a distribution Pj with 6 in some set w, then the family 
ff’’’ = {Pf 1 9 e to) of possible distributions of T is said to be complete if 


( 1 ) 

implies 

( 2 ) 


I m dpj(t) = 0 , 


for all 9 fu, 


f{i) = 0, a.e. 9’’’, 


that is, for all t except possibly in a set N for which Pb (N) = 0 for all 9 t «. 
The family is said to be boundedly complete if this implication holds under 
the assumption that / is bounded. 

The relation of these concepts to the problem of unbiased estimation is an 

W6 



completeness in sequential case 


377 


immediate consequence of a theorem of Blackwell [6]. Let Z be a random vari¬ 
able with distribution Pe , 6 e u, and let T be a sufficient statistic for B Denote 
by Pe the distribution of T, and suppose that 9’^ is complete. Then every func¬ 
tion g(e) for which there exists an unbiased estimate, that is, a function </> such 
that 

Ee 4>{X) = g{6), for all 9 ew, 

possesses an unbiased estimate with uniformly minimum variance. One can say 
furthermore that if <t>(X) is any unbiased or bounded unbiased estimate of g(e), 
then the optimum estimate guaranteed by the above statements is the condi¬ 
tional espeetation of <f)(Z) given T. 

The aim of the present paper is to obtain certain results concerning complete¬ 
ness in sequential sampling schemes. Some necessary conditions for complete¬ 
ness are given in section 3, and these are used to obtain necessary and sufficient 
conditions for completeness when the random variable being sampled has a 
Poisson or rectangular distribution In section 4 it is shown that certain neces¬ 
sary conditions given in [3] for the binomial case are also sufficient 

3. A necessary condition for completeness. The sequential sampling schemes 
with which we are concerned are of the following nature. There is given a sequence 
of real valued random variables Zi, Z 2 , • • ■ with a joint distribution depending 
on a real parameter 6, wffiich ranges over a set w We shall assume that for 
each m the set of variables Zi, • • • , Xm admits a real valued sufficient statistic 
Tm — imiXi , • • , Zm) for 9, and that for each m the family 9”’” of distribu¬ 
tions of Tm is complete. We next suppose that there is given a stopping rule, 
which IS such that after m observations have been taken, the decision of whether 
or not to take an observation depends only on the value of 

imiXi, • , Zm) It follows (see [6]) that if the total number of observations is n 

(a random variable which may be infinite), then (T, , n) is a sufficient statistic 
for 6. We shall say that the sequential procedure is complete if the family of 
distributions of (r„ , n) is complete. Throughout, we shall assume that all 
sequential procedures ip question are closed, i.e. that for each 0 e u, n is finite 
with probability 1. 

Let F be a random variable distributed over a Euclidean space according to 
a distribution P? with 0 in u. We shall say that a point y lies in the positive 
sample space of Y if there exists 0 « w such that every open set containing y 
has positive probability for this 0, and that y is an impossible point if it lies in 
the complement of the positive sample space. Consider now a sequential sampling 
scheme as described above. For any integers m < p we shall denote by the 
positive sample apace of Tp given the first m steps of the stopping rule, that is, 
given for f = 1, • • • , w the set of values of Ti for which sampling is discon¬ 
tinued after the ith observation Since all the T’b are real valued, the sets 
are sets of real numbers satisfying the obvious condition Wp ^ . The 

union U Sm (Sm is the set of points of WZ * for which no m+lst observation is 



378 


E L. LEHMANN AND CHARLES STEIN 


taken) will be called the set of stopping or boundary points, the points belonging 
to some WZ~^ — Sm are the continuation points. 

. We need the following 

Lemma 1. A necessary condition for a sequential procedure of the type described 
above to he complete ^s that every procedure obtained from the given one by trunca¬ 
tion he complete} 

This is an immediate consequence of the following more general 

Lemma 2. Let Xi, X%, • ■ be as before a sequence of random variables such 
that for each m the set Xi ,■■■ , Xm admits a real valued sufficient statistic 
Tm = tm{Xi , ■ ■ , X„). Let Si, Sa , - • , Sr each be a complete, closed, sequential 
procedure based on these sufficient statistics. Let Si u Ss u ■ • • u Sr denote the sequen¬ 
tial procedure according to which we continue taking observations until at least one 
of the stopping rules Si, ■ • ,3, tells us to stop. Then the procedure Si u • • - u Sr 
IS complete 

This clearly implies Lemma 1. For if one takes for Si any closed, complete 
sequential procedure and for Sa a procedure of fixed sample size, then Si u Sa 
IS the associated truncated procedure. 

Proof of Lemma 2 It is sufficient to prove the result for the case r = 2. 

Let Ri, TLs!, n denote the number of observations taken under Si, S 2 , Si u S 2 
respectively. Then n = Ri if ni g R 2 , R = Ra if ni n 2 • Let / be any function 
on Si u S 2 such that 


Then 


Esf(Tn ,n) = 0 for all Sew. 


E> E[.f(r„ , n) I , m] = 0 | 

Ee E[/(T„ , n) j , 712] = oj 


for all dfu. 


Smee Si and S2 are complete it follows that 


E[fiT„ , n) I Tni = <1, Ri = Ti] = E\fiT„ , n) | Tnj = , n2 = 72] = 0 , a.e. 

Hence 


0 = P(wi ^ 7121 = <1 , Til = yi) f{k , 7i) 

+ P{ni > Th 1 Tni = ti, Ri = yi)E[fiT„^, nf) \ = h , Ri = 7i, Ri > M, 

and the analogous condition holds with the subscripts 1 and 2 interchanged. 

We shall prove that /(r„ , ri) = 0, a.e., by induction over the possible values 
of 71. Suppose, therefore, that for some integer m 

P«(ti ^ m, /(T„, 71) 5 ^ 0) = 0. 

(This is certainly true for m = 0.) It then follows that if we take 71 = tr + 1 
in (3) the second term of the right hand side vanishes, so that 

0 = P{n = Ri 1 = < 1 , Til = m + !)/(<!, w + 1). 

' The authors would like to thank Mr E. Fay for pointing out an error in the original 
proof of this Lemma. 



COMPLETENESS IN SEQUENTIAL CASE 


379 


Hence, 

Pein = m = m + 1, , Ui) pi 0) 

g Piin == ni = m + 1, P(n = rii i T„, , m) = 0) = 0 
Analogously we see that 

Pein = 712 = m + l,/(T„a, 712) 5^ 0) = 0 

and, adding, that 

Pe{n = TM, + 1, /(r„, 7i) 0) = 0. 

This completes the induction. 

We need further the notion of strong completeness. Consider a random 
variable W — (17, V), suppose that the distribution of W depends on 6, and that 
U IS a sufficient statistic for d Let P^ be the conditional distribution of V given 
U = u —^this is independent of 6 since 77 is a sufficient statistic for 6 —and let 

9’’'* = {ffuj. We say that the pair SP”", 9*’’* is strongly complete if the conditions 

(i) Ee fiV) exists for all 8, 

(ii) Eijiy) 1 17 = u) = 0 for almost all u, 
imply 

/(«) = 0, a.e. 9”'. 

For brevity, we shall then usually say that {9*^} is strongly complete. 

We can now state the following necessary condition for completeness. 
Theobem. Ij a closed sequential procedure of the type considered above is com¬ 
plete, then 

(i) Sm IS almost empty for every m for which Wm+i — WZ+i is almost empty, 

(ii) for each m for which Smis not almost empty, the family of conditional dis¬ 
tributions of Tm given Tm+i ~ t {ast ranges over W^i — WZ+i) is strongly complete. 

Proof. For any teW'^X — Wm+i the positive sample space of Tm given Pm+i = t 
IS clearly contained in Sm . Suppose first that (ii) is violated and consider the 
sequential procedure obtained from the given one by truncation after m-\- 1 ob¬ 
servations By the lemma it will be enough to show that the truncated procedure 
is not complete. For this purpose let us assume that regardless of the stopping 
rule all 771 -b 1 variables Xi, , Xm+i are observed. We want to construct an 
estimate of zero based on the sufficient statistic for the truncated procedure. 
This estimate must be a function of Ti for Tn Si, of T 2 for Tte Si, etc. That is, 
although we may imagine that the full sample of size tti -b 1 is taken, we must 
be careful not to use observations that are impossible when the stopping rule 
IS followed. 

We shall now show that there exists an unbiased estimate of zero which is 
zero over Si, , Sm-i , equal to f{Tm) on Sm and giTm+i) on WZ+i where f 
and g will be defined below. Since expectation equals expectation of conditional 
expectation, a statistic is an unbiased estimate of zero if its expectation exists 



380 


E. L. LEHMANN AND CHAHLES STEIN 


and its conditional expectation given jT^+i = i is zero for almost all t. In our 
case this condition is equivalent to 

(4) [ f(u) dP^iu I r„+i = t) + g{t) f dP„(u | T™+i = t) = 0 
for almost all t e Ifm+i, 

(5) f f(u) dP„{u 1 Tm+i = t) = 0 

for almost all i ^ WZ+i , i.e. for almost all i e Ifm+l; — WTn+i, since 
i i WZ+i implies P((Sm| Tm+i = 1) = 0, 

together with the existence of EeifiTm) \ n = m) and Eb(.cj{T^+i) ] n = m + 1) 
Since (ii) does not hold there exists / not vanishing a.e such that 
Es(J{Tm) \ n = m) exists and (6) is satisfied. If g is defined by (4), 
EeigiTm+i) 171 = m + 1) exists, and this completes the proof of the necessity 
of (n). 

The necessity of (i) is now obvious For if (i) is violated, then (5) is satisfied 
vacuously, and we can take / to be an arbitrary positive valued function (for 
example) and (4) will then be satisfied. 

As immediate consequences of this theorem we shall obtain two conditions, 
which are easier to apply than condition (ii). 

CoROLLAEY 1 A uecessary condition for completeness is that for no m there 
exists a subset A of Sm such that 

Pb(.A) > 0 for some 0 

and 

P{A \ Pm+i = 0=0 far almost dll t t — WZ+\ . 

Corollary 2. Suppose that the sequence of X’s is such that in the nonsequential 
case for all m, p with m < p the positive sample space of Tm given T-p = t is the 
intersection of the unconditional positive sample space of Tm with the interval [0, t]. 
Then a necessary condition for a sequential procedure to be complete is that each 
8m differ from a half-open interval {possibly empty) [am , bm) with am ^ bm , ax = 0, 
Um+i = bm ,by a set of probability 0. 

Proof. Let r be the first value of m for which this condition is not satisfied. 
Then there exists c > br-i such that the sets Sr n [c, ») and Er n [6r-i , c) both 
have positive probability The result now follows from Corollary 1 if one puts 
A = (SrO [c, 00). 

Next w’e consider some examples. 

Example 1. Let Xj, Z2, • • be independently normally distributed with 
known variance and unlmown mean B. In this case Tm = ^r=iX,, and since 
the positive sample space of Tm+i is the infinite interval regardless of the values 
of Ti, • , Tm it follows from condition (i) of the theorem that no sequential 
procedure is complete, with the trivial exception of the procedures with fixed 
sample size 



COMPLETENESS IN SEQUENTIAL CASE 


381 


Example 2. Let Xi , Xa , ■ • ■ be independently uniformly distributed over 
the interval (0, 0), 0 < 6 < Then Tm = max (Zi, • • , Xm) and Corollary 2 
gives a necessary condition for completeness If the procedure is truncated we 
can deduce sufficiency of this condition from (5) However, this proof does not 
apply to the general case The following proof of sufficiency is similar to some 
of the proofs in [3, 4, 5]. 

Suppose Sx, Si, • ■ ■ form a set of adjoining intervals (some of them possibly 
empty), Sm = [a™ , hm), and suppose there is a non-zero unbiased estimate of 
zero, = (t>iTn , n). Let m be the smallest integer for which (/> is not zero almost 
everywhere on Sm Then 

“ (9) 

Hs(4>) = Pe(n = | n = m) + Pe(n = ] n = j) = 0, 


and hence 

( 9 ) » 

(6) Pe{n = ni)Eg{^ | n = m) s — 12 Pe{n = \ n = j). 

j—m+X 


Now the right hand side of (6) is zero when 6 ^hm , since it is then impossible 
that T, e Sj for any j > m Hence 

Eg[(t>{Tm ,m)\am ^ Tm KK] = 0 for all 6 ^bm, 

and therefore 

/ dx = 0 for all 9 in [a™ , bm]. 

But this implies ^{x, m) = 0 almost everywhere in Sm , which is a contradiction. 

Example 3 Let Xi, Xa , ■ • be independently distributed according to a 
Poisson distribution with mean 9. Then Tm = 12r=i X^ and again we can apply 
Corollary 2. To prove sufficiency we proceed as in example 2. If the condition of 
Corollary 2 is satisfied we may write without ambiguity \piTn) for (l>{Tn, n). 

Let c be the smallest value of Tm for which 0. Then if the probability 

(9) 

of r„ = J is the identity X»($) = 0 implies 

</.(c)/c(c)e“p 2 f: 4>U)kij)<t>’ ■ ^(c)/c(c)0'e-"”’“ ^ i: <p(j)H3)9’e-'"'’ . 

J-C+l 9=c+l 

Dividing this equation by and letting 9 tend to zero we see that the right 
hand side tends to zero, which implies ^{c) = 0 and hence a contradiction. 


4. The binomial case. As was mentioned in section 1, the problem of bounded 
completeness was solved for the binomial case in [3, 4, 5]. Since presumably one is 
unwilling to estimate the bounded parameter p by means of an unbounded 
estimate, further work here may seem unnecessary. However, the problem of 
completeness seems to be of interest for two reasons. If the procedure is bound- 



382 


E. L. LEHMANN AND CHARLES STEIN 


edly complete without being complete then, even though one may be reluctant 
to use such an estimate, there may exist an unbounded unbiased estimate of p, 
which for some values of p has smaller variance than the minimum variance 
bounded estimate. (An example of this is given in [2]). Since this possibility is 
ruled out when the procedure is complete it is seen that completeness permits 
statement of a stronger optimum property. Apart from this one may be interested 
in estimating some unbounded function of p such as 1/p. In this case bounded 
completeness does not permit any statements concerning existence of optimum 
estimates 

In the present section we shall change our notation somewhat. We are con¬ 
cerned with a sequence of independent trials with constant probability p of 
success. On the basis of m trials the total number y of successes is a sufficient 
statistic for p. Instead of representing the sufficient statistic for the sequential 
procedure by {y, n), we shall use the representation (x, y) where a; is the total 
number of failures, so that x + y ^ n. The couples (x, y) may be thought of as 
making up the points with integral-valued coordinates of the first quadrant 
of an xy-plane, and as before may be classified as boundary points, continuation 
points, and impossible points. Adopting the terminology of [3], we shall call 
the value oi x + y the index of the point (x, y), so that the points of index m 
lie on the line x + y = ra. 

Girshick, Mosteller and Savage defined a sequential procedure to be simple 
if for each m the continuation points of index m form an interval They proved 
that a necessary and sufficient condition for a bounded procedure to be com¬ 
plete is that it be simple. (A procedure is said to be bounded if there exists N 
BO that the number of observations is ^N.) They also showed that in general 
simplicity is not sufficient for completeness. However, it was shown later [4, 5] 
that simplicity is sufficient for bounded completeness 

A sequential procedure is said to be closed if the probability of termination is 
unity for every p with 0 < p < 1. It was proved by Girshick, Mosteller and 
Savage that a necessary condition for completeness of a closed sequential pro¬ 
cedure is that no procedure obtained from the given one by removing a boundary 
point be closed. (Removing a boundary point here means converting it into a 
continuation point.) We shall prove below that this condition together with 
simplicity is also sufficient for completeness. An interesting question is whether 
these two conditions are sufficient for completeness for the general sequential 
schemes considered in section 2, when simplicity is replaced by the condition 
that every procedure obtained from the given one by truncation is complete, 
and when the second condition is modified by the appropriate null set qualifica¬ 
tions. It is easily seen that both of these conditions are necessary. 

The following definitions will be needed below. A boundary point (a, b) is a 
lower (upper) boundary point if for some a: < 0 (>0) the point {a + x,b — x) 
is a continuation point. An impossible point (a, b) is a lower (upper) impossible 
point if for some x < 0 (>0) the point (a a;, t — a:) is either a continuation 
point or a boundary point. 



COMPLETENESS IN SEQUENTIAL CASE 


383 


If the procedure is unbounded every boundary point is either a lower or an 
upper boundary point. If it is simple, no point can be both an upper and a lower 
boundary point. The same remarlcs apply to impossible points. 

Thbohem a necessary and suffiaent condition for completeness of a closed 
-procedure in the binomial case is that 

(i) the proeedure is simple, 
and 

(ii) the removal of any boundary point destroys closure 

Pboof Necessity was proved in [3] as was sufficiency for bounded procedures. 
Sufficiency for unbounded procedures will follow from the following two facts, 
which we shall prove below. 

I. Suppose (i) holds and there exist numbers a,M > 0 such that for all boundary 
points (*. y) of index m ^ M the ratio y/x ^ a. Let f{x, y) be a non-zero un¬ 
biased estimate of zero defined over the set B of boundary points, and let mo 
be the smallest index for which there are points with/Cx, y) 9^ 0 Then/(a:, y) = 0 
for all lower boundary points of index mo • 

II. If (i) holds and if for every positive number a there exist infinitely many 
boundary points (a;, y) with y/x ^ a, then one may remove any lower boundary 
point without destroying closure 

Suppose now that a sequential procedure satisfies (i) and (li). Then, since no 
lower boundaiy point can be removed without destroying closure, it follows 
from II. that there exist a and Jlf such that y/x S o for all boundary points of 
index ^M. Hence if fix, y) is an unbiased estimate of zero, and if mo is defined 
as in I., fix, y) = 0 for all lower boundary points of index mo. Because of sym- 
metiy the statements concerning upper boundary points analogous to I. and II 
also hold. It then follows analogously that fix, y) = 0 for all upper boundary 
points of index mo . But for a simple unbounded procedure every boundary 
point is either an upper or a lower boundaiy point, and hence we obtain a con¬ 
tradiction with the definition of mo - 

Before proving I. and II. we state the following corollary, which generalises 
an example given in [3]. 

Corollary. A sequential procedure that is not bounded and that has a finite 
non-zero number of lower boundary points is not complete. The analogous result 

holds for upper boundary points. _ 

Proof of Corollary. This follows easily from II., since if a procedure ot 
this type is to be closed there must exist for each a > 0 infinitely many upper 

boundary points (a;, y) with y/x a. f # t ri it 

In the remainder of the paper we are concerned with the proofs of 1. and ii. 

Proof of I Assume I to be false, and let (ajo, 2/o) be the lowest oun ary 
point of index mo for which fixo , 2/o) 5^ 0 Then y > ya for all other oun ary 
points ix, y) for which fix, y) 0. Plence if the probability of a point ix, y) 
is cix, y)p'‘^ and if hix, y) = c(x, y)fix, y), 

Hxo, = -S/c(x, y)pY, 



384 


E. L. LEHMANN AND CIIABLES STEIN 


•where the summation extends over all boundary points of index ^mo for which 
y > yo. Dividing both sides by p’'° 'we see that 

k{xo , ya)<f° = -pS/c(a;, 

If we can show that the expression multiplying — p on the right hand side 
remains bounded as p tends to zero, we have a contradiction. For letting p 
tend to zero, we would then see that the right hand side tends to zero and the 
left hand side to /c(a:o, po), and hence that /(aio , 1 / 0 ) = 0. 

To prove this, note that 

I 2/)p‘'-*'“->9* 1 1 Hx, y) I 

The right hand side is a power series in p. We shall show that this senes con¬ 
verges for some po > 0. This implies uniform convergence for | p | < po, and 
therefore the series remains bounded at p = 0 By assumption there exist num¬ 
bers a and M' such that yjx a for all boundary points with y > M'. From 
now on wc shall consider all series as being summed over the set of boundary 
points for which y > M' and hence ^ ^ Since only a finite number of 
terms are omitted this does not affect any convergence properties. 

Let 0 < Pi < 1. Then, since / is an unbiased estimate of zero, the series 

Sfc(x, i/)pig? 

converges absolutely. Hence, so does 

s 1 fc(x, y) 1 S s 1 fc(x, y) 1 (gipib’'"’'”"’ = S 17c(x, y) 1 

and consequently the last series is convergent. 

Peoop of II. Let R be any closed simple procedure satisfying the conditions 
of II., and let (aio, po) be any lower boundary point of R We denote by R* the 
procedure obtained from R by taking (xo, po) to be a continuation point and 
by n* the number of observations for R*. 

We first prove that any upper impossible point of R is also an impossible 
point of R*. The negation of this would imply that one can get from a lower 
boundary point to an upper impossible point going only through impossible 
points. This would require at least one step of either of the following kinds: 
Lower impossible point —> upper impossible point, 

Lower boundary point —> upper impossible point. 

One can easily convince oneself with the aid of a diagram that any procedure 
under which such steps are permitted cannot be simple. 

Let 0 < p, IT < 1, and let a be such that 0 < a < p/q . If p is the true prob¬ 
ability of success, y/x tends in probability to p/q, and hence there exists JV 
such that 

P{y/x ^ a I p) > TT 

whenever the index of (x, y) exceeds N. By assumption there exists Ni > N 
and a boundary point (xi, yi) of R* of index Ni such thatpi/xi ^ a. Then the 



COMPLETENESS IN SEQUENTIAL CASE 


385 


probability exceeds ir that the random point (x, y) of index Ni will lie above 
(aji, j/i). Since (xi, yi) is a boundary point, the probability is therefore greater 
than IT that the point {x, y) of index N is either an upper impossible point for 
R and hence impossible for R*, or a stopping or continuation point for R. We 
have therefore proved that the probabihty is >ir that either n* ^ Ni or the 
point (x, y) of index Ni is a continuation point of R. 

But given that one has reached a continuation point (a, h) of R, there exists 
Ni such that 

P(n*^W2jp,(a, h))^t. 

For 

P{n* > N 2 1 (a, h)) = Pin > Ni 1 (a, b)) 0 as Wj . 

Since there are only a finite number of continuation points of index Wi, it is 
now clear that there exists Na such that 

P(n* ^ Wo 1 p) ^ IT + T - 1, 

which can be made arbitrary close to 1 by proper choice of tt. Therefore R* 
is closed. 


REFERENCES 

[1] E. L, Lehmann and H. SchepfS, "On the problem of similar regions," Proc Nal Acad. 

Sci., Vol 33 (1947), pp. 382-386. 

[2] E. L. Lehmann and H, SchbpfA, “Completeness, similar regions and unbiased esti¬ 

mation," unpublished 

[3] M. A Girbhice, Prbdbhick Mostellee and L. J. Savage, "Unbiased estimates for 

certain binomial sampling problems, with applications,” Annak of Math. Stat., 
Vol, 17 (1946), pp. 13-23 

[4] J. WoLFOWiTZ, "On sequential binomial estimation,” Annals of Math. SM , Vol. 17 

(1946), pp, 489-493. 

[5] L. j Savage, "A uniqueness theorem for unbiased sequential binomial estimation," 

Annak of Math. Stat,, Vol, 18 (1947), pp. 296-297. 

[6] D. Blackwell, "Conditional expectation and unbiased sequential estimation,” An¬ 

nals of Math. Stat , Vol. 18 (1947), pp 105-110. 



SOME ESTIMATES AND TESTS BASED ON THE r SMALLEST VALUES' 

IN A SAMPLE 

By John E. Walbh‘ 

The Rand Corporation 

1. Summary. Let ua consider a situation where only the r smallest values of 
a sample of size n are available. This paper investigates the case where n is 
large and r is of the form pn + 0 (>/»)• 

Properties of some well known non-parametnc point estimates, confidence 
intervals and significance tests for the 100p% point of the population are in¬ 
vestigated. If the sample is from a normal population, these non-parametric 
estimates and tests have high efficiencies for small values of p (at least 95% 
if p ^ 1/10). 

The other results of the paper are restricted to the special case of a normal 
population. Asymptotically “beat” estimates and tests for the population per¬ 
centage points are derived for the case m which the population standard devia¬ 
tion is known. For the case in which the population standard deviation is 
unknown, asymptotically most efficient estimates and teats can be obtained 
for the smaller population percentage points by suitable choice of p and 0(\/n). 

The results derived have application in the field of life testing. There the 
variable associated with an item is the time to failure and the r smallest sample 
values can be obtained without the necessity of obtaining the remaining values 
of the sample. By starting with a larger number of units but stopping the experi¬ 
ment when only a small percentage of the units have “died”, it is often possible 
(using the results of this paper) to obtain the same amount of “information” 
with a substantial saving in cost and time over that which would be required 
if a smaller number of units were used and the experiment conducted until all 
the units have “died”. Jacobson called attention to applications of this type 
in [1], 

2. Introduction and statement of results. In life testing, information con- 
cermng the smaller population percentage points may be of primary interest. 
The principal aim of this paper is to investigate the properties of some well 
known non-parametric estimates and tests of the smaller population percentage 
points which are based on statistics of the type used for the sign test. These 
non-parametric results are easy to apply and have several other desirable prop¬ 
erties (see Theorem 1 and its discussion). In particular, if the 100p% point 
is to be investigated, it is only necessary to fail approximately 100p% of the 
number of starting items to obtain the required statistics (n large). Thus, if 
the non-parametric results should also happen to be reasonably efficient, they 

* The author would like to express his appreciation to Max Halperin for calling atten¬ 
tion to this problem and for valuable advice and assistance in the preparation of the paper. 

386 



SOME ESTIMA.TES AND TESTS 


387 


would appear to be ideal for a life testing situation where a smaller population 
percentage point is to be investigated. 

Examination shows that life tests of the “wear out” type sometimes yield 
empirical distributions which are approximately normal. Also in many cases an 
approximately noraial distribution can be obtained by an appropriate monotonic 
change of variable. Thus the case in w'hich the n observations are a sample from 
a normal population will receive special consideration m this paper 

Investigation of the efficiency of the non-parametric estimates and tests will 
be limited to the situation where the n observations are a sample from a normal 
population. Three cases will be considered: 

(A) . Asymptotic efficiency of the non-parametric results as compared with 

the corresponding most efficient results based on the entire sample 
(population variance unknown). 

(B) . Asymptotic efficiency of the non-parametric results as compared with 

the corresponding most efficient results based on the jm + 0{'\/n) 
smallest order statistics for the situation where the variance of the nor¬ 
mal population is known. 

(0). Asymptotic efficiency of the non-parametric results as compared with 
the corresponding most efficient results based on the -|- 0(\/n) 
smallest order statistics where |3 is slightly greater than p (population 
variance unknown). 

The definition of “asymptotic” efficiency together with some of its properties 
is given in Section 3. Only asymptotic efficiencies will be considered.^ However, 
the efficiencies obtained for the asymptotic case would seem to represent lower 
bounds of the efficiencies for the corresponding non-asymptotic cases since ex¬ 
perience indicates that the efficiency of non-parametnc results usually de¬ 
creases as the sample size increases. 

First let us consider case (A). From Theorem 3, the asymptotically most ef¬ 
ficient results for estimating or testing the 100p% population point on the basis 
of the entire sample (population variance unknown) are furnished by the non¬ 
central t-statistic. An expression for the asymptotic efficiency of the non-para¬ 
metric results as compared with the corresponding results based on the non¬ 
central <-statistic is given m the Corollary to Theorem 3. The reciprocal of this 
efficiency represents the factor by which the original number of starting items 
must be multiplied if the non-parametnc results are to asjmptotically. furmsh 
the same “information” as the non-central (-statistic applied to the original num¬ 
ber of starting items. Table 1 contains values of this factor. Although a larger 
number of starting items are used by the "information equivalent” non-para- 
metric results, a noticeably smaller number of items are failed. The factor by 
which the number of items failed is decreased equals the value of p multiplied 
by the factor by which the number of starting items was increased for the “equiv- 


® Some power function comparisons for the non-asymptotic case were given by Paul 
H. Jacobson in [1]. 



388 


JOHN B. WALSH 


alent” non-parametric result. Table 2 contains a list of some of the resulting 
factors. 

Next consider case (B). The first step in the analysis for this case consists in 
obtaining the asymptotically most efficient results. These derivations are con¬ 
tained in Theorems 4 and 5. The Corollary to Theorem 5 contains an expres¬ 
sion for the asymptotic efficiency of the non-parametric results for case (B). 
The factor by which the original number of starting items must be multiplied 
to obtain “information equivalent” non-parametric results is obtained in the 
same way as for case (A) Table 1 lists values of this factor. In this case both the 
number of starting items and the number of items failed are slightly increased 
by use of the “equivalent” non-parametric results. The factor by which the 
number of items failed is increased equals the corresponding factor for the in¬ 
crease m number of starting items. For convenience of reference, however, values 


TABLE 1 

Asymptotic ratio of total numbers of items tested 
{Non-parametric test over most efficient test) 


Case 

.01 

1 

.02 

,06 

.10 

20 

.30 

.40 

60 

.70 

(A) 

377% 





153% 

I 155% 

157% 


(B) 






114% 


128% 

164% 

(C) 

111% 

114% 

118% 

122%' 

! 

129%! 


148% 




of this factor are also given in Table 2. If the variance of the normal population 
were unknown, the asymptotic efficiency of the non-parametric results would be 
at least as great as that obtained for case (B), and likely greater. 

Finally consider case (C). Let p be replaced by P in Theorem 5 while the value 
of P corresponding to a given value of p is defined by the relation in Theorem 
6. By suitable choices for the values of P and Of-v/n) in Theorem 5, it is possible 
to obtain asymptotically most efficient results for the population 100p% point 
when the population variance is unknown and only the (3n -f 0(-\/n) smallest 
values of the sample are available. These results are presented in Theorem 6. 
The Corollary to Theorem 6 contains an expression for the asymptotic efficiency 
of the non-parametric results as compared with the corresponding results of 
Theorem 6. The factor by which the number of starting items must be increased 
to obtain “equivalent” non-parametric results is computed as in cases (A) and 
(B). Table 1 contains values of this factor. The value of p represents the fraction 
of starting items which are failed if the estimates and tests of Theorem 6 are 
used. Table 2 contains corresponding values of p for certain values of p. The 
factor by which the number of items failed is decreased equals p/p times the 











SOME ESTIMATES AND TESTS 


389 


factor by which the number of starting items was increased to obtain the “equiv¬ 
alent” non-parametric results. Table 2 presents values of this factor. 

The results of Theorem 6 furnish an asymptotically efficient method of esti¬ 
mating and testing the smaller population percentage pomts while only failing 
a small percentage of the starting items (for the case of normality). Since a larger 
number of items are failed and much more work is required for computing the 
necessary statistics, however, this method is not necessarily preferable to the 
non-parametric method from the viewpoint of “information" per unit cost. In 
many cases the difference in cost will be slight. Since the non-parametric results 
are valid under much more general conditions, they would seem to be preferable 
for these cases. 


TABLE 2 


Asymptotic ratio oj numbers of items failed 
{Non-parametric test over most efficient test) 



.0113 

0234 

.0612 

.130 

.287 

.476 

.70 


Case 

01 

02 

.06 

.10 

20 

30 

.40 

50 

(A) 

3.77% 

5.40% 

9.50% 

16.0% 

30.2% 

45.9% 

62.0% 

78.5% 

(B) 




105% 

109% 

114% 

120% 

128% 

(C) 

99% 

98% 

96% 

94% 

90% 

88% 

85% 



3. Definition of asymptotic efficiency. In this section the n observations are 
assumed to be a sample from a normal population Let the 100p% point of the 
population be denoted by Bp . Several classes of results for investigating Bp are 
considered m this paper. For example, the non-parametric estimates and tests 
represent one class; the asymptotically most efficient results based on the entire 
sample (population variance unknown) represent another class; etc. The results 
considered consist of point estimates of Bp , confidence intervals for dp, and sig¬ 
nificance tests for Bp based on these confidence intervale. For a specified class, 
every point estimate and every endpoint of a confidence interval (a one-sided 
confidence interval has only one endpoint) consists of some statistic T whose 
variance is of the form a-p/n -f- o(l/n) for large n. Here 0 % is independent of n 
and has the same value for all statistics T of the class. Also for every such statis¬ 
tic T the quantity 

■\/n{T — B^/vt 

has a distribution which is asymptotically normal with unit variance and some 
finite mean A which is independent of the unknown parameters of the normal 
population. By suitable choice of T, the mean A can be made to have any speci¬ 
fied value. 






390 


JOHN E. WALSH 


Now let U5 define the asymptotic efificiency of the class of non-parametrio 
results as compared to a class of results of the type defined by (A), (B) or (0). 
Let the non-parametric results be based on n sample values while the other class 
of results is based on m sample values. Let the common value of for the non- 
parametric results be denoted by a\ while the common value of this quantity for 
the other class is denoted by al . If <Ti/n = a-l/m when m = nE, then the asymp¬ 
totic efficiency of the non-parametric results (compared to the specified class 
of results) is defined to be 100^1%. For the situations considered in this paper, 
E is independent of m and the parameters of the normal population. 

Asymptotic efficiency, as defined in the preceding paragraph, has the property 
that the statistic (or statistics) yielded by a non-parametric result based on n 
sample values has approximately the same distribution as the corresponding 
statistic (or statistics) baaed on m sample values from the specified class iim, = nE 
(n large). For example, consider a non-parametric unbiased estimate Ti of dp 
based on n sample values and an unbiased estimate Ti of 6p from the specified 
class based on m sample values. Then, if m = nE, the distributions of 

■s/n(Ti — dp)/iTt, \/n{Tt — fip)/(ri 

are asymptotically identical (note that vi/n = al/m), Similarly for the end¬ 
points of confidence intervals Consequently the power functions of significance 
tests based on corresponding confidence intervals are asymptotically identical 
if m = nF. It would therefore appear that the definition chosen for asymptotic 
efficiency is suitable for the situations to which it is applied. 

4. Notation. In this paper t(l), • • • , i(n) will represent the values of the set 
of all n observations arranged in increasing order of magnitude. Then 

^(1)» • • ■ I i(r) 

are the r smallest values of the set of n observations. The notation t{r) has mean¬ 
ing only if r is an integer such that 1 ^ r g n. Often, however, expressions of 
the form t[pn -f- Oi's/n)] will be encountered. In what follows, an expression of 
the form l{z) has the interpretation t (largest integer For example, 

<(487i) = <(487). 

Also the r = pn + 0(Vn) smallest observations are frequently referred to, 
here r is interpreted to be the largest integer contained in pn -f 0(\/n); etc. 

B. Theorems and derivations. First let us consider some well known estimates 
and tests of the population percentage points which are based on statistics of 
the type used for the sign test. These estimates and tests are valid under ex¬ 
tremely general conditions. It is not necessary that the observations be drawn 
from the same population or even that any two observations come from the 
same population. Population percentage points are not necessarily unique. The 
strongest continuity restriction imposed is that the population cdf be continuous 
at the percentage point considered. These results follow from 



SOME BBTIMATEB AND TagTs 


391 


Theorem 1. Lot t(l)) ’' represent the values of n ohservahons arranged 
in increasing order of magnitude. The n oblations are statistically independent 
and from populations which satisfy the conditions: 

(I). The populations have at least one 100p% point in common. 

(II) If Iho populations have only one common 100p% point, the cdf of each 
population is continuous at that point. 

Lei dmote the value of the common 100p% point if it is unique, or the open in¬ 
terval of common 100 p% points otherwise {i.e., the interval of common 100 p% points 
with its endpoints deleted). Then asymptotically (n —> «) 

(i) i[pn) is a medi an estimate of 9p , 

(ii) . Prlt[pn + KaVnp(l — p)] < ^p\ - I‘r{itpn + Ka\/np(l — p)] ^ 9p} 

= a, 

where Ka is the standardized normal demote exceeded with probability a. Eelations 
(i) and (ii) are approximately satisfied if pn> 5 and p S 
Phooe. This theorem is a direct application of the binomial theorem Condi¬ 
tions (I) and (II) assure that the equality between the probabilities in (ii) 
holds. Relations (i) and (ii) are obtained by using the normal approximation to 
the binomial theorem; this approximation is reasonably accurate if pn > 5 and 
p ^ ^ (see [ 2 ]). 

The non-parametric confidence intervals investigated are of the forms 

4 . Bi^/n + o{iJn)\ < 9p , l[pn 4- Bty/n + o(\/n)] > Ap. 
tlpn + BiVn + o(V«)l < < tl'P't + Bj-v/n -f o(\/ii)l (Bi < Bj), 

(these intervals have the same confidence coefficient if < is replaced by ^ and 
> by ^)- The significance tests considered are those obtained from these con¬ 
fidence intervals while the point estimates of 9p are based on single order statis¬ 
tics of the form t[pn 4- B-y/n + o('s/n)L 
When dp la an open interval, (i) and (ii) need interpretation. The meaning of 
(i) is that the probability of i{pn) exceeding every value of Op has the value ^ 
and that the probability of it being less t han all values of 6 , also has the value i 
The inequality t[pn + K«Vnp(l - p)l ^ 9p has th e interpreta ton that every 
value of Bp i s greater th an or equal to flpn KaVnpil - p)]- Similarly for 
t[pn Ka'i/ np(l — p)] < ®p • 

The purpose in introducing the case where Bp ia an open interval was to point 
out that situations where population percentage points are not unique cause 
little difficulty if suitably interpreted. 

Non-parametric results of the type coneidered in Theorem 1 are also available 
when the sample size is not large. For any sample size n, if the conditions of 
Theorem 1 are satisfied, 

Pr[t(r) < Bp] =. Pr[t(r) i Bp] = t ,, p'(l - ?)""• 

The probability relations in Theorem 1 were obtained by approximating this 
summation for large n By suitable choice of r, confidence intervals and signif- 



392 


JOHN B. WALBH 


icance tests with a wide range of satisfactory confidence coefficients and sig¬ 
nificance levels can usually be obtained for a given value of n. 

The above discussion emphasizes the generality of application of the non- 
parametric estimates and tests. For most practical situations, however, it is 
permissible to assume that the observations are a random sample from a popula¬ 
tion which has a probability density function that is non-zero over the range of 
definition and differentiable several times. Then asymptotically t{pn) is also a 
mean estimate of 6 p (which is now necessarily a single point). Moreover, the 
asymptotic distribution of t[pn CVn + o(\/n)] can be found interne of p, 
C, 9j, and the value of the probability density function at Op . These results are 
a consequence of 

Thboeem 2. Lei ihe population from which the n sample values were drawn have 
a pdf f{t) such thatf{t) 7 ^ 0 over its range of definition andf'(t) exists and is con¬ 
tinuous in some neighborhood of t — dp . Then the variable 

Vn/pil - p)f( 0 p){t{pn -f Cy/n + o{\/n)] — 
has a distribution which approaches the normal distribution with mean 

C/\/ p(l - p) 

and unit variance asn—* 00. 

Proof. If pn is replaced by pn -f CVn + o(Vn), the method used to prove 
this theorem is completely analogous to the proof presented on pp. 368-69 of [3]. 

Now let us consider the asymptotically most efficient results for estimating 
and testing Op based on the entire set of observations for the case of a sample 
from a normal population (population variance unlaiown). 

Theorem 3. Let the n observations be a sample from a normal population {un¬ 
known variance o'). Asymptotically the most efiicient point estimates, confidence 
intervals and significance tests for Op using all the observations are those based on 
the non-central t-statistic. The value of (r% {see Section 3) for these results based on 
the non-central t-statistic is ff“(l -1- K\,/2). 

Corollary. For case (A) the asymptotic efficiency of the non-parametric results 
equals 

100(1 + iCV2)/2irp(l - p) exp {K],) %. 

Proof. The maximum likelihood estimate of Op based on all n sample values is 

(1) [iw - -1). 

This quantity is equivalent to the non-central i-statistic, as can be seen by 
multiplying and dividing [(1) — Op] by 

Prom maximum likelihood theory, (1) is an efficient estimate of Op . Asymp- 



SOME ESTIMATES AND TESTS 


393 


totically (n ^ °o) the variance of (1) is of the form 

<7^(1 + Kl%ln + o(l/n), 

and it is easily seen that the variance of an endpoint of a confidence mterval for 
dp based on the non-central f-statistic is also of this form. The corollary follows 
from combining Theorem 2 with Theorem 3. 

Next let us investigate the situation where only the r = pn -)- 0(Vw) smallest 
values of a sample of size n from a normal population with mean n and variance 
a, denoted by N{n, a), are available. First let us consider the asymptotic dis¬ 
tribution of 

r 

1(^) + 2ap{n 

_i_ 

r -t- 2ap(?i • 

where 

ap = Kpl2-\/^ (1 - p) exp Q + l/4ir(l - pY exp (Kp), 

bp = l/\/^(l - p) exp 
This distribution is given by 

Theorem 4. Let <(1), • ■ , t{r) bethe r = pn + 0(-\/ri) smallest values {ar¬ 
ranged in increasing order of magnitude) of a sample of size nfrom N{fj,, a). Then 
asymptotically (n —» <») the distribidion of (2^ is N (0,1). 

Corollary. Letr = pn C-\/n + o(\/n). Then as n increases the distribution 

of 

? + 20p(?i - r)i(r) (1 - p)(5p-h 2apji:p) ^ 

r -f 2ap(n — r) ^ p -|- 20p(l — p) _ / + 2op(n — r) 

approaches the normal distribution with unii variance and mean 
C{bp -h 2apKp)/lp -t- 2ap(l - p)]®'^ 

Proof. The proof of this theorem is long and will be deferred to section 6 of 
the paper. 

If the value of a is known, the Corollary to Theorem 4 can be used to obtain 
point estimates, confidence intervals and significance tests for any population 
percentage point (including p). The resulting estimates and tests are asymptot¬ 
ically most efficient. This follows from 
Theorem 6. Ccmsiderthe r = pn -\- 0{‘\/n) smallest values of a sample of size 




394: 


JOHN B. WALSH 


n from N{^, o-“) where a is knovm. Asymptoiieally (n -+ co) the variance of every 
unbiased estimate of y. based on only i(l), ■ • , <(r) and t? is greater than or equal 
to a quantity of the form 

a /nip 4- 2ap(l - p)] + o(l/n). 


CoHOLLARY. For cosB (B) the asymptotic efficiency of the non-parametric results is 


100 


exp 

L27rp(l - p) 



%. 


Proof. The proof of this theorem is similar to the proof presented for The¬ 
orem 4 and will be given in section 6 following the proof of Theorem 4. 

Let p be replaced by /3 in Theorem 4. Even if o- is unknown asymptotically 
most efficient estimates and teats can be obtained for the 100p% point of the 
population if /3 is defined by 

(3) ffp = (1 - /3)(b^ + 2apK^)/\p + 2a^(l - /3)]. 


Theokbm 6 . Let p, (0 < p < 0, be given and (9 defined by (3). Let i(l), • • • , t{r) 
be the r = /3n -j- C-\/n + o(Vn) smallest values of a sample of size n from a normal 
population. Then asymptotically 


2 <(f) + 2a^(n - r)((r) 


^[r + 2a^(n - r)] < 
\/2ir Luo 




dx. 


Corollary. For case (C) the asymptotic efficiency of the non-parametric results is 


100 


exp i-Kl) 

L2irp(l - p) 







V2^ 


+ 


exp i-Kl) 
221(1 - ^)/J 


%. 


Proof. Theorem 6 is an immediate consequence of relation (3) and the Corol¬ 
lary to Theorem 4. The Corollary to Theorem 6 follows from Theorem 2 and 
Theorem 6. 


6. Long proofs. This section contains the long proof of Theorem 4 and the 
related proof of Theorem 6. 

6.1. Proof of Theorem 4- If <(r) is such that 

y - g <(r) ^ y - n“*"“, 

the ratio of the value of the joint probability density function f of t{l), • • • , t{r) 
to the value of the function 


( 4 ) 


nid - py-' ( 1 f 1 ^ 

r«(f) - uj 

(n - r) 1 \\^a) 2 ? 

L . J 

-in- r)a + K,] 

' - (n- r)b + X,]] 



SOME ESTIMATES AND TESTS 


395 


is of the form 1 + o(l). Here (and in the remainder of section 6) a = Uj,, b = bp . 
Also, for large n and any positive e, the integral of / over the ranges of the 
1(1), ,t{r - 1) and for l(r) between n - Kp<T - n~^‘ and n - Kpo- + 
differs from unity by a quantity which is of the order o(l), i.e., a quantity which 
—>0 as n —> 00 . 

Now consider the moment generating function of (2), i.e , In evaluat¬ 

ing this function of 6, let the range of integration of t{r), (i e., the range after the 
other variables have been integrated out), be subdivided into the five intervals 

— CO to M — D(t, II — Da to n — Kpa — 

M to u. - Kpa + 

H — Kpa + to ft Da, ft + Da to co. 

Here D is a positive constant which is independent of n and such that 

(l/H)"“'(l/p)^'[V(l - p)]"-" < expF- 1 L l + ? g g p) l 

L Vr + 2a(n — r) J 

for n sufficiently large and 

D> \ Kp\ + n-^i^ya, 1 - N(jy) = N{-D) < e-'“>'ID, 

where 

^ L •‘<1 

First let us consider the interval ft — Kpa — to p — Kpa + Using 

(4) in place of /, completing the square in the exponent, making the change of 
variable 

x(i) = t{i) — 9/\^r + 2a('n — r) {%= I, •••, r), 
integrating a:(l), • , x{r — 1) over their ranges and then x(r) over the interval 

ft — Kpa — — d/-\/T + 2a(n — r) to 

ft — Kpa + — 9/\/r -f 2a(n — r), 

an expression of the form 

(5) exp {^/2) + 0(1) 

is obtained. From the above results, this expression differs from the correspond¬ 
ing mtegration of / by a term of order o(l); hence the contribution to the mgf 
for the interval considered is of the form (5). 

Next consider the interval ft — KpO -f to ft Da. After 1(1), • • • , 
l(r — 1) have been integrated out, the integrand becomes 
nl / r tjr) - ft _ 6 

(r — 1) i(?i — r)! \ 1_ a y/r 2a(n — r) 

2ea(n - r) V tjr) - ft , „ I , b9(w - r) ) 
Vr -|- 2a{n — r) L ^ J Vr -H 2a(n — r) J 



(6) 



396 


JOHN E. WALSH 


By writing {N[(t(f) — /x)/o- — 0/-\/r + 2a{n — r)]r ^ the form 
j^iV + o(l))/\/r + 2a(n — r) 

. N VS exp 


and maximizing exp {29a(n — r)[tO’) — + 2a(n — r)} with respect to 

t{r) in the specified interval, it is seen that the value of (6) is less than an expres¬ 
sion of the form 

for n sufficiently large. Differentiation shows that {iV’[]}'’~^{l — iV'[ is a 
decreasing function of t(r) in the specified mterval if n is large enough. Also, if 
t{r) = n — Kp<r + for large n the value of 


. ' f(r) - M 

_ V _ 


(r-l)n-"/10 






is less than a constant which is less than unity. Thus the value of (6) is less than 
a quantity of the form 

which in turn is less than an expression of the form 
CiVn exp (-C?n''“) -f- o(l) 

for n sufficiently large. Thus the integral of (6) over the specified interval is of 
the order o(l). An analogous proof shows that the contribution to the mgf for 
the interval n — Da to n — Kpa — is also of order o(l). 

Finally consider the interval ju -H Do- to w. For large n the integral of (6) 
over this interval is less than an expression of the foim 


m 1“(„ f^)i »p {-«» - 

i.e., the contribution to the mgf for this interval is of the order o(l) since the 
coefficient of the integral is less than an expression of the form C ■\/n- The upper 
limit (7) was obtained by replacing 

N ([f(r) — iA/<f — d/\/r + 2a(n — r)} by 1, 



SOME ESTIMATES AND TESTS 


397 


(1/7))""'' (1/p)^”^ [1/(1 - p)]"“' exp [-0(n - r) (b + 2aKp)l^/r + 2a(7i - r)\ 

by 1. 


A similar type proof shows that the integral of (6) from - « to /i - Dir is also 
of the order o(l). 

Thus the mgf of (2) is of the form (5) for large n and Theorem 4 is verified. 

6£. Proof of Theorem S. Let us consider a single sample value from the multi¬ 
variate population consisting of the r smallest order statistics of a sample of size 
n from N(fi, , where ^ is known. Then the variance of every unbiased estimate 
of jn based on this sample and the value of is greater than or equal to the re¬ 
ciprocal of 


( 8 ) 



fdt{l) • • • dtir) 



log/ 

diP 


fdtO) • ■ • dt{r), 


where f is the joint pdf of the r smallest order statistics of a sample of size n 
from iV(M, )■ For proof of this statement see pp. 480-81 of [3]. In the lower part 
of (8) the variables t{l), • • • , i(r - 1) can be integrated out leaving an explicit 
function of f(r) to be integrated from - <» to <». To evaluate this integral for 
large n, choose some large but fixed interval jn - D(7 to ju -b Do- as was done in 
the proof of Theorem 4. Using a method similar to that presented on pp. 368- 
69 of [3], the value of the integral for the interval p - Do- to /i + Do is found 
to be of the form 

n[p + 2o(l - p)]/a + o(n). 

A procedure analogous to that used in the latter part of section 6,1 shows that 
integration outside this interval yields an expression of order o{n}. 


REPEKENCES 

[1] Paul H Jacobson, “The relative power of three statistics," Jow. Amer. Stat. Assn., 

Vol. 42 (1947), pp. 575-584 

[2] Paul G. Hobl, Introduction to Mathematical Statistics, John Wiley and Sons, 1947, p. 45 

[3] Harald Cramer, Mathematical Methods oj Statistics, Princeton Univ, Press. 1946. 



ON THE RELATIVE J:FFICIENCIES OF BAN ESTIMATESi 
By Leo Katz 
Michigan State College 

1. Introduction. J. Ncymnn [3] defined BAN (best asymptotically normal) 
estimates as those functions of observed relative frequencies which i) are con¬ 
sistent, ii) are asymptotically normally distributed, iii) are asymptotically ef¬ 
ficient and iv) possess continuous partial derivatives with respect to each relative 
frequency. He suggested the following two problems, first, to determine the 
class of estimates which possess the above four properties and second, to investi¬ 
gate this class of estimates to see whether, and under what conditions, the use of 
some of them is preferable to the use of others. Neyman’s paper dealt with the 
first problem directly and with the second obliquely. With respect to the first 
problem, he showed that two types of x^-minimum estimates belong to the 
class of BAN estimates as do, obviously, maximum likelihood (ML) estimates. 
On the second problem, the x^-minimum estimates may be more easily computed 
than the corresponding ML estimates in many cases, the ease of computation 
being especially pronounced for the modified x^ with observed, rather than ex¬ 
pected, relative frequencies in the denominators. The present paper contains 
some additional information regarding the relative merits of these estimates, 
For simplicity, we shall consider a random variable taking on values 

a: = 0,1, 2, 3, • • • 

with probabilities p(» | 1 ‘ , ^r) depending on r parameters. In working 

with x’^-minimum estimates, it is almost always necessary to truncate the prob¬ 
ability law, taking 

/(x) = p(a! 1 01 , , • • • , 0r), .T = 0,1, • • • , fc - 1, and 

( 1 . 1 ) 

/(A) = P(X I 01, 02, 0r). 

The ML estimates are asymptotically efficient, i.e., have minimum variance, 
with respect to the probability law, p(x | 0), and the x estimates have the same 
property with respect to the truncated p. 1., /(x \ 0). This suggests that the op¬ 
timum variances of the estimates of the parameters of the two in samples of N 
may differ and, further, that the minimum variance of the x^ estimates may de¬ 
pend essentially upon the choice of fc. In the course of some unpublished work by 
Evelyn Fix and others in the Statistical Laboratory at the University of Cali¬ 
fornia on X estimation of the parameters of several different p. l.’s the same 
anomalous situation occurred repeatedly. When the observed data were fitted 

^ This paper was presented to a joint meeting of the American Mathematical Society 
and the Institute of Mathematical Statistics at Boulder, Colorado on September 1, lfi49. 

398 



KELA.TIVE EFFICIENCIES OF ESTIMATES 


399 


by the truncated p. 1. with the estimated parameters, the fit appeared to be 
improved when h was chosen smaller. This suggested that perhaps, contrary to 
intuition, it might be possible to improve the precision of estimation by choos¬ 
ing k smaller, within certain limits. This paper proves that this notion is false 
and that some other explanation of this phenomenon is needed. 


2. Relative efihciency. Cramer [1] has shown, simultaneously with Rao [6], 
that under mild conditions of regularity, the variance of an unbiased estimate, 
d* = 9*(xi ,X2, ■ ■ ■ , Xu), of a single parameter, d, where ai, xa, ■ • ■ , xj, are 
the observed sample, satisfies the following inequality for fixed N: 


( 2 . 1 ) 


D\d*) > 


1 


NE 


d log p{x) 
. ^ 


2 I 


the lower bound being attained only by “efficient” statistics. We may take as 
a measure of the relative precision attainable in the estimation of the parameter 
of the truncated p. 1. (1.1) the ratio of the lower bounds (2.1) of variances of the 
estimates of the parameters of the original p. 1, p(x | fi), and of the truncated 
p. l.,/(xl0). We define 


( 2 . 2 ) 


Rel. Eff. 


E 

d log/(a!)T 


E 

'9logp(a;)T 

. 99 


In the case of functions depending on several parameters, p{x\6i,di, • • , 0r)i 
and unbiased estimates, 8* , which are functions of the observed relative fre¬ 
quencies, with non-singular covariance matrix || L„ 1|, Cramfir [1] showed that 
the fixed ellipsoid. 


(2.3) 


r r 

iV = r -f- 2, 

j-i 1—1 


where 


3.^ = g p log P(=g) ^ tog ?(^) 'j 

lies wholly within the concentration ellipsoid, 

(2.4) E i: L'%tj = r+2, 

»-i j-i 

where H L*' || = \\L{] 1|“^. The two ellipsoids coincide if and only if the 6* are 
joint efficient estimates of the 9.. Thus, the covariance matrix of a set of joint 
efficient estimates is |1 Nd,j 1|~^. In this case, we may define separately the 
relative efficiency with respect to each of the parameters as in (2.2) or we may 
consider the set of estimates for one function to possess greater concentration 



400 


LEO KATZ 


than the set for the other function if the fixed ellipsoid (2.3) for the first lies 
wholly within the similar ellipsoid for the second. The latter will be the procedure 
we adopt in section 5. 


3. Estimation of a single parameter. With p(a; | 9) and f(x [ 9) defined as in 
(1,1), form the difference 


(3.1) 





The regularity conditions under which the Oramdr-Rao inequality (2,1) holds 
involve existence of dp{x)/d6 for all x and absolute convergence of 


* dO 


Assuming we have a regular case of estimation in Cramdr’s sense so that these 
conditions hold, we may write 


(3.2) 



1 

m 


~d/(fc)T 
_ ae J ’ 


and, since df{k)/d9 = 2” (dp(a:)/39) by the second of the regularity conditions 
above and/(/i:) = p(a!) by (1.1), 


( 3 . 3 ) mm = zpi^) 


—1 

1 

h 

1 

2 

3p(x)i 

_-\/p(») 99 


1 

1_ 


By the Cauchy inequality, the right member of (3.3) is non-negative and, since 
f{k) > 0, it follows that <f>(fc) S 0, with the sign of equality holding only when 
dp{x)/d9 IS proportional to p(x) for all a; ^ In this event, p{x) = where 

Kt is a constant depending on 6, Now, if g{x) is constant, p{x) is a rectangular 
p. 1. On the other hand, if g{x) is not constant, there are two cases which must 
be considered, namely: 

a) p{x) = a: ^ 0, and 

b) pix) = pi(x 1 0), 0 ^ X < a ^ k, 

= X ^ o. 

In the first case, Ki = and is independent of 6, so that we do not 

have a case of estimation at all. In the second case, each p(x) for x ^ a is known 
o priori to within a multiplicative constant depending on 9 and, hence, no essen¬ 
tial information is lost in truncation. Thus, except in these trivial cases, the 
relative efficiency is less than unity. 

It then appears that, in every case of regular estimation, the variance of an 
efficient estimate of the parameter of the p. 1. p{x | 6) is less than the corre¬ 
sponding variance for the truncated p. 1. /(x | 6) and that, as an immediate 
consequence, the ML estimate in general is capable of greater precision than 



eeliAtive efficiencies op estimates 


401 


the x^-minimum estimate for fixed N. This is the result mentioned in the first 
paragraph of section 1. It should be pointed out that the regularity conditions 
for the Cram6r-Tlao inequality are stringent enough to give this result. To com¬ 
plete the argument for estimation of a single parameter, form the function 

M 00 

(3,4) iik) = p{k) 2 Vi^) S p(a:)[0(fc) “ ^(k + 1)], 


where is defined by (3.1). Using (3.1) and (1.1), we may write 



Making use of (3.5), straightfoi-ward algebraic reduction of (3.4) gives 

(3.6) m = 1; pw - p(‘) i »». 

the sign of equality holding again only for the p. l.’a discussed after (3.3). Since 
the first three factors in the right member of (3.4) are positive, it follows tha.t 
(i(fc) is a strictly decreasing function of k. Thus, the variance of an efficient esti¬ 
mate of the parameter of a truncated p. 1, f{x), depends upon the choice of fc 
and decreases in strictly monotone fashion to the variance of the origmal p. 1., 
p(x) as limit. As a result, the anomalous situation mentioned in the second 
paragraph of section 1 does not arise through irregularity in the behavior of this 

variance. 

4. Poisson and binomial probabUity laws. The Poisson p. 1., p(x jX) = e“''X7a:l 
gives immediately 


a log p(x) 


_ 1 
"X’ 


whence, from (2.1), we obtain the usual result that the variance j)f the b 
unbiased estimate of X is X/iV. The truncated p. 1. hM^d log/(x)/aX - (x/X) 
forx g (7c - 1), and (3 log/(7c))/3X = p{k - l)/2-ifc PW- 
Thus, j 

(4.2) E r ^ = I [2 pW + • 

L 3X J X L 0 S p(®) 


Writing P(k - 1) for Do ‘ PW> 

. X [p(7c - 1)] 

(4,3) Eel. BfE.poi,.on(fc) = Pik - 1) + (X - k)p{k “ D + i _ p(fc - i) 



402 


LEO KATZ 


Values of p(/c) and 1 — P{k — 1) are given directly in Molina's Tables [2] for 
integer values of Ic and X = .001 (.001) .01 (.01) .3(.l) 15(1) 100, or may be 
obtained indirectly from Pearson’s Tables [4] of the incomplete P-function. In 
the classical example of a Poisson p. 1. quoted by von Bortkiewicz, relating to 
numbers of deaths due to kicks by horses in Prussian Army Corps, N = 200 
and the average number of deaths per corps-year is .61. Either procedure would 
take fc = 2 and X = ,6, approximately. Using these values, we find that Rel. Eff. 
(fc = 2 I X = .6) = .9508, i.e., the loss in efficiency incurred by using a esti¬ 
mate rather than a ML estimate is of the order of five per cent. 

The binomial p. 1. is given by p(x [ n, 5) = e*(l - 0)"“*, x = 0, 1, • ■ ■ , n, 

where n is a known parameter and d is the parameter to be estimated from a sam¬ 
ple of N observations. We obtain directly E[(d log pix))/def = n/(0(l - $)), 
Computing a similar quantity for the truncated p. 1. and making use of the nota¬ 
tions p(x-, n) = - ey-^ and P(o; n) ^ ES p(a:; ii), we obtain, after 

some reduction. 


Rel. Eff.binon,iai(fc) = 


1 


in - l)P(k - 3; n - 2) 


(4.4) 


^ g -P(fc — 2; n — 1) + nPik ~ 1; n) 


n{P {k - 1; n) - P(fc - 2; n - 1)}*~ 
l-P(^!-l;n) 

The form (4.4) is suitable for computation if tables, such as Pearson’s Tables 
[5], of the incomplete B-function are available covering a range up to the param¬ 
eter n. If such tables are not available (4.4) is inconvenient since it involves 
probabilities associated with three different binomial laws In this case we may 
use the relations 

P(a;n) - P(o - 1; n - 1) = (1 _ e)pia-, n - 1), 
p(a; n) = — p(a - 1; n - 1) and 

- 1 ; - 1 ) 

to obtain the alternative form 

Rel. E£f.binomiiii(fc) = Pik — 1; ra — 1) -p in6 — k)pik — 1, n — 1) 

, nejl - 9)[pik - l;n - 1)]° 

1 - P(fc - l;n - 1) -P epik - 1; w - 1)’ 
which involves only the one binomial p. 1., p{x\n - 1, $). 

As an example, consider the probability situation in which ten independent 



RELATIVE EFFICIBNCIB8 OE ESTIMATES 


403 


trials are made, each 'with the same probability of success, 8. The number of 
successes in each set of ten trials is one observation. On the basis of N observa¬ 
tions, we are to estimate 6. We shall investigate the relative efficiencies when 
6 = .10. Taking n = 10 and 8 = .10 in (4.6) we compute the following table of 
relative efficiencies for different choices of k: 

Relative efficiencies of % estimates in the case of the binomial p. 1., n = 10, 8 = .10 


k 

Rel. Efl. 

2 

.8993 

3 

.9828 

4 

.9979 

5 

.9998 


It is obvious from the table that the loss in efficiency is not great when fc § 3 
and, hence, the variances of the x estimates are practically equal to the variance 
of the ML estimate. But, in ordinary practice, N, the number of sets of ten trials 
each, would have to be over 140 before k could be safely chosen as large as 
Ic = 3, and even fc = 2 requires iV ^ 38. Cases in which we seek to estimate 
parameters on the basis of about 100 observations are not rare, in the present 
instance, use of a x* estimate would produce about 11% greater variance than 
the use of a ML estimate. 

The two elementary examples considered in this section provide only very 
fragmentary evidence of the need for caution in employing x^-minimum esti¬ 
mates; much numerical work would have to be done to provide any rehable guide 
to the relative efficiency of such estimates. 


6. Estimation of two or more parameters. Consider the p. 1. p{x j 8i, di, 
^ Sr), X — 0, 1,2, , with ellipsoid of concentration for a set of joint effi¬ 

cient estimates given by (2.3). The truncated p. 1. given by (1.1) has a corre¬ 
sponding ellipsoid of concentration 


(2,30 


= r -f 2, 


d log f{x) d log /(^) 1 show, in this section, that the el- 

u. ddt ddj J 

_lies wholly within (2.30, is so if the left member of (2.3) is 

uniformly greater than the left member of (2 30, for every choice of the ti,i = 
1,2, • • • ,r. Accordingly, we form the difference, 


with fi(j = E 
hpsoid (2.3) 


(5.1) 

Adopting the notations, 


Q(fc) = r Z (5« - Ot. h ■ 

,-i 


aid AW-®' 



404 


LEO KATZ 


we obtain by direct subtraction, 

r r r « 


(5.2) 


Qik) = £ 2 2 - /»(^)/j(fe) 

,_1 J _1 |_*-fc pix) f(k) J 


titj. 


Equation (5.2) is unchanged if the right member is written in the form 

(5.3) I «{t) -1E [1; ^ p,(») 


1—1 1 


If this latter is now written as 


_ I pix) 


Utj. 


(5.4) Q(fc) = Ei:/(fc) 

1-1 j-i 


J_ Y if p^i^) _ _ fiW) 

Kk) ^ WpC*) /(?c) / \p(a;) /(/c) 


p(a:) 




it IS evident that the expression in square brackets in the right hand member is 
precisely the mean value of the expression in curly brackets taken over the set 
X t. k.li we denote by E {g(x )) the expected value of g{x) over the set x k, 

we have 


( 6 . 6 ) 


am -ttm §®)<,( 

.'-1 i-i l\p(a;) fik) / \p(x) 



Finally, since the (finite) sum of the expected values is equal to the expected 
value of the sum, we have, 


( 6 . 6 ) 


Qik) = m E 


FeiM /i(^) 

\_p{x) f{k) J 


ti 


Since/(fc) > 0, Q(k) S 0. We need only note that Q(fc) = 0 only if the linear form 
in curly brackets in (5.6) is identically zero, i.e., if each coefficient of U vanishes. 
This can happen only in the trivial cases analogous to those described in Sec¬ 
tion 3. 

It has been shown that the ellipsoid of concentration Of a set of joint ef¬ 
ficient estimates of the parameters of a p. 1. lies wholly within the corresponding 
ellipsoid of the truncated p. 1, Therefore, the best procedure for estimating the 
parameters of a truncated p. 1. cannot attain the precision of an efficient pro¬ 
cedure for estimating those of the original p. 1. 

In order to complete the argument for the general case, we form the difference 


(6.7) 


Qik) - Qik + 1) _fA^)f>ik) 

v-i 1-1 L Pik) fik) 

_^Mk + mk + l)~\, , 

fik + 1) J ‘ ^ 


Making use of the two relationships fik) = p(/c) -|- fik -|- 1) and /»(fc) = 



EELATIVE EFFICIENCIES OP ESTIMATES 


405 


p.(A) + /*(^ + 1)) we have 



Q{i) - q(h +1) = yW + D J^rg^- 

M) ii-i L vm /(fc 1). 



The right member of (5.8) being positive except in the trivial cases, it is clear 
that (3(fc) is a strictly monotone function of fc. 


6, Conclusions. It has been shown that the efficiency of ^'‘•minimum estimates, 
or any other estimates which involve computation in terms of a truncated p. 1., 
is necessarily less than the efficiency of corresponding ML or other estimates 
based on the original p. 1. and, further, that the efficiency increases with the 
pomt of truncation, This was established for estimates of a single parameter and, 
also, for joint estimates of several parameters. Examples given indicate that, 
in any case of regular estimation, use of x^-minimum estimates rather than ML 
estimates should be accompanied by an investigation into the loss in efficiency. 
The author is indebted to Professor J. Neyman, who suggested the problem. 

REFERENCES 

[1] H. CiiAMiiR, "Contributions to the theory of statistical estimation,” Skandinmsk 

AkimrielMnfl, Vol, 29 (1946), pp. 86-94, 

[2] E. C, Molina, Poisson’i Exponential Binomial Limit, Van Nostrand, 1942 

[3] J. Nhyman, "Contribution to the theory of the test,” Proceedings of the Berkeley 

Symposium on Mathematical Statistics and Probability, University of Cali¬ 
fornia Press, 1949, pp. 239-273. 

(41 K. Pbahson, Tables of the Incomplete T-function, Cambridge University Press, 1922. 
[6] K. Pearson, Tables of the Incomplete B-fmction, Cambridge University Press, 1934, 
(6] 0. R, Rao, "Information and the accuracy attainable in the estimation of statistical 
parameters,” Bull, CaUutia Math Soc,, Vol. 37 (1946), pp 81-91. 



UNBIASED ESTIMATES WITH MINIMUM VARIANCE 

By Charles Stein 
University of California, Berkeley 

Summary. Subject to certain restrictions, a characterization of unbiased 
estimates with minimum vaiiance is obtained. For two fairly broad classes 
of problems, solutions are given which are more readily applicable. These are 
used to obtain, such estimates in some particular cases. The applicability of 
the results to problems of sequential estimation is pointed out. The problem 
of unbiased estimation is not at present of much practical importance, but 
is of some theoretical interest and has been treated by many statisticians. Also, 
the method used in this paper may be applicable to other problems in statistics. 

1. Introduction. Let E be a space of points x, B an additive class of subsets C 
of R and y. a measure over B such that R can be represented as the union of a 
countable collection of elements of B each of which has finite ^-measure. Let Q 
be a set called the parameter space and let X be a random variable distributed 
in accordance with the probability density function p(a: 1 d) for some 6 tQ, 
so that for any C t B 

P(Z«Cl(i) = f p(x\e)dy(x). 

Jo 

A measurable real-valued function/(a:) on R is called an unbiased estimate of the 
real-valued function g(0) on 0 if, for every ff eO 

(1) B(f(X) I ») = / /(*)?(* I 0) dfi(x) = ff(8). 

The problem considered in this paper is that of finding an unbiased estimate 
f* of g which minimizes the variance at . Since this variance is 

^([/(X) - gWf 1 0o) 

^ 2 ) = / r/(a;) - g(&i>)rp(x I 0) dy(x) 

= J [/(*)]“p(a: I 0) dy{x) - J f{x)p{x \ 0o) j , 
this problem is equivalent to minimizing 

(3) j [f(?o)fp{3:\0o) dn{x) 

subject to (1). It will be convenient to introduce the measure 

(4) v{C) = f p{x\ Po) dy{x) 

Jc 

406 



and the probability ratios 

(5) 


UNBIASED ESTIMATES 


407 


■ir(x[d) = 


p(x I B) 
p(x I ^o) ■ 


We suppose v{x ] 6) finite for almost all x, and all 6. When we say “for almost 
all X,” we mean “except for a set of ^-measure 0.” 

In most practical problems, the set B is a subset of some finite-dimensional 
Euclidean space and /x is either ordinary Lebesgue measure or, in the case where 
B is countable, counting measure which makes the measure of a set the number 
of points it contains. An exception is the application to sequential analysis 
considered in section 3 below, in which E is a countable union of sets, each of 
which is a subset of a finite dimensional Euclidean space. For the basic notation 
and concepts of the theory of integration see Saks [2], Oh. I. 

We shall define 


( 6 ) 


A(6i, di) =s j ir(a: I ^0’r(a: I ^ 2 ) dv(x), 


and suppose 


(7) 


A (6, 6) < <x> for all 6. 


By Schwartz’s inequality this implies that A(&i, ffi) < <» for all , ^ 2 . If (7) 
is not true then it may happen that there exists no unbiased estimate with 
minimum variance even though there exist unbiased estimates. Consider, for 
example, the case where S2 consists of two point, 0 and 1, and g(6) = 0, and 


p(x 1 0) 
p(x 1 1) 


1 for 0 < X < 1 
0 otherwise 

ixr* for 0 < I < 1 
0 otherwise 


and p is ordinary Lebesgue measure. It is clear that there exist unbiased estimates 
of 0 with arbitrarily small positive variance at 0 = 0 but there exists none with 0 
variance. 


2. The principal theorem. In accordance with the usual terminology we de¬ 
note by Li the class of all measurable functions if> such that 

(8) J [<f,(x)T dv(x) < <=o. 

Finally, 0 is the class of all functions ^ expressible in the form 

(9) if'(0) = j il>(x)Tr(x\0) dvix) with <l>eLi. 

Theoebm 1. If 7 r(a; j 0) is finite for all 9 and almost all x, and (7) is satisfied, 
and there exists an unbiased estimate of g, then there exists an unbiased estimate 



408 


CHARLES STEIN 


/* of g which minimizes (3). If f* has finite variance then any other unbiased esti¬ 
mate of g with minimum variance at Oq is essentially equal to f*, that is, differs 
fromf* only on a set of n-measure 0. A function f is an unbiased estimate of g with 
minimum variance at Bo if and only if there exists a real-valued functional T on G 
for which 

(10) TA(d, di) = g{ei) for all Bi e £2, 

(11) T j (hix^irix I 6) dv(x) = J <j>(x)f(x) dv{x) for all <f> t L^. 

{The preceding sentence does not assume the existence of an unbiased estimate of g.) 
The minimum variance is Tg{B) — 

Proof. Let (/,) be a sequence of unbiased estimates of g such that 

lim f [f,{x)f dv{x) = g.l.b. f lf{x)f dv{x) 

where f ranges over all unbiased estimates of g. Then by the weak compactness 
of every sphere in hi (see [1], p. 10) there exists/* c Lj and an increasing sequence 
(n,-) of integers for which 

I <t)f* dr = lim f <^/„, dr for all 4> t Li. 

<-*oe J 

Since r{x | S) «Lj by (7), this implies that f* is an unbiased estimate of g. Also 

(12) j [/*]^ dr < hmj /“. dr = g.l.b. J f dr. 

Thus f* is an unbiased estimate of g with minimum variance. 

Let 4>i « hi be such that 

(13) j <t>iix)v{x I e) dv{x) = 0 for all 0 e £2. 

Then, using the /* defined in the last paragraph, we obtain for any real e 

(14) 0 < / (/* + e0i)“ dr - / [/♦r dr = 2e I 0^* dr + e* | dr 

since f* + e^i is an unbiased estimate of g. Dividing (14) by e and letting e —»0 
we obtain 

(15) J <pif* dp = 0. 

If a function in G can be represented in two ways, 

J <i>(x)ir(xl6) dp(x) = J p'(x)ir(xle) dp(x), 



UNBIASED ESTIMATES 


409 


and consequently (jii = (p' — ^ satisfies (13) and (15) Thus (11) defines a func¬ 
tional on G in a consistent way Also, this functional satisfies (10) since 

TA{d, Oi) = T j f{x\6^Tr{x\d) dv{x) 

= f v{x\di)f*(x) dvix) = gidi). 

By (2) and (11) the minimum variance is 

J lf*(x)T dv(x) - [si(0o)f = ^/ dv{x) - [ff(eo)f 

= Tg{e) - [9{B,)]\ 

To prove the converse, let /* be any function in Li for which there exists a 
functional T satisfying (10) and (11) By (11) with <^(x) = Tr(.'c | 6i), 

J f*{x)ir(x 1 di) dv(x) = ^ J 1 I ^i) dv{x) 

= TA{6, e,) = g{9{) 

by (10), 80 that/* is an unbiased estimate of g. Any other unbiased estimate/ 
of g with finite variance at do is an element of Li Thus from (1) and (11) 
we obtain 


Tg(e) = / //* dp 

= / [f*r dp. 

Applying Schwartz’s inequality to the middle expression we obtain 

/ [f*f dp < j [ff dp 

with strict inequality unless / is essentially equal to /*. 

CoROLLABY 1. Suppose ir(a; | 6) is finite for all 6 and almost all x and (7) holds. 
Let Hi{x, d) be the set of all 6 eQ such that ir(,x \ 6) > d, and let H he the smallest 
additive class containing all Hi(x, d). Suppose there exists an additive set function X 
over H such that there exists a finite collection of parameter points 6k and positive 
number c* such that 


(16) 

for almost all x, and 
(17) 


j* 7r(x 1 e) I dXid) 1 < S CKirix I Bk) 
J A(B, By) dx{e) = g{6y). 



410 


CHA.B.LES STEIN 


Then the uniiased estimate of g{6) with minimum variance at 6o is 

(18) f*(x) = J x(a:|e) d\(d) 

and the minimum variance is 

(19) / gie) dx(e) - [£7(e„)r. 

Proof: We need only show that (10) and (11) are satisfied by 


THe) = f HB) d\(.9) 

and (18). But 

(20) TAie, 6f) = j A( 0 , di) dUe) = gief) 


by (17) and 

T f <j>(x)ir(x I 6) dv(x) 

( 21 ) 


J d\id) J (j>(.x)ir(x I 6) dv{x) 

J (p(x) dv(x) j 7r(x \ 6) d\(6) = J ij>(x)f*(x) dy(x). 


Since each of the functions r(x [ 0) considered as a function of x and 9 is 
measurable (BE), their product is also. The interchange of order of integration 
in (21) is justified by Fubini’s Theorem (Saks [2], p. 87) and (16) which by (9) 

implies that J 1 dX(0) 1 J 4>(x)‘!r(x 1 6) dp(x) < CO. The equations (20) and (21) 

are equivalent to (10) and (11) respectively. 

Corollary 2, Suppose Tr(x | 6) is finite for all d and almost all x and (7) holds. 
Suppose also that Uis a set of real numbers and: 

(i) for some m, either a positive integer or + “, Tr{x j 6) is, for almost all x, 
differentiable m times with respect to 6 at d = 8o, 

(li) for each n <. m there exists a finite collection of parameter values and 
positive constants Cn.k such that 


( 22 ) 


tt'"' (x I 00 + 5) — TT^"^ (x 1 6o) 

S 



Cn.k ir(x I 0n,fc) 


for all S whose absolute value is sujficienlly small and almost all x, 

(iii) there exist constants a„ such that for all di, 

(23) 0(0i) = i:a„r£A(0,0jl , 

n=0 j9«flo 

(iv) there exists a finite collection of parameter values di, and positive constants 



UNBIASED ESTIMATES 


411 


Ck such that 

7)1 

(24) E 

n—0 

Then the unbiased estimate of (7(5) with minimum variance at 6 a is 



The minimum variance is 



dll 


” 1 

— 7r(a:l5) < E C4ir(a; | 5^). 


Proof. We need only show that the functional T defined by 

/ m A “ 

<^(3:)7r(a; | 6) dv(x) = Jlun— / ^(a:)ir(a: [ 5) dy{x) 

satisfies (10) and (11) with f* given by (25). Equation (23) yields (11) immedi¬ 
ately. Also 

/ Trt nW A 

<l)(x)'!r(x I 6) dv(x) = an " / <i)(x)vix ( 6) dvix) 

)tsa0 du J ^ 

= Ittn j <pix) 7r{x I 9) 

by (9), (i), (22) and Lebesgue’s Theorem on term by term integration (Saks 
[2] p, 29.). Using (24) and Lebesgue’s Theorem, we find that this is equal to 

f </>(a;)2: a„ ’^(^ I f <t>(.xi)f* {x) dr{x). 

J da ” J«-«o ■' 


dvix) 

0=^5 


which completes the proof. 

There is an obvious combination of Corollaries 1 and 2 which will not be 
stated explicitly. Also Corollaiy 2 can be extended to involve differentiation 
with respect to several parameters. It would be of considerable interest to 
obtain a characterization of all possible functionals T in terms of the usual 
operations such as integration and differentiation. Also, the methods used here 
should be applicable, with some modifications, to other problems of minimization 
subject to an infinite set of side conditions. 

Corollary 3. Suppose that subject to the condition of Theorem 1, for i = 
1 , 2, f* are unbiased estimates of < 7 , with minimum variance at 60 • Then fi + jfz is 
an unbiased estimate of gi <72 with minimum variance at 60 ■ 

This follows immediately from (11) and (12) in Theorem 1. Actually, the 
restriction to problems satisfying the conditions of Theorem 1 is unnecessary, 
but we shall not prove this here. 



412 


CHAHLES STEIN 


3. Some special cases. We first consider a problem which is of little practical 
interest but serves well as an illustration of Corollary 1, Let X be a single obser¬ 
vation from a uniform distribution on the interval (0, S -h 1), i.e. 

f 1 if e < a: < 0 -h 1 

P(^ 1 . 

0 otherwise. 


We suppose 0 lies in the interval (—iV, iV — 1) where iV is a given positive 
integer, and take as the distribution for which the variance is to be minimized 


pCx I do) 


iil-w<*<w 

0 otherwise. 


This is the same as using the original p d.f. p(x [ 0) with 0 a random variable 
taking on the values — W, —W+l, — 1 with equal probability. The 

measure is of course ordinary Lebesgue measure. Then 


(27) 

and 


2Arif(9<x<fl-l-l 

ir(a: 1 fl) = I 

0 otherwise 


0 if < (Ja - 1 

fli — 02 1 if — 1 <!. 01 0t 

01 — 01 -|- 1 if 02 <C 01 < 0s + 1 
0 if 02 -f- 1 < 01 • 

For —lV<0i<Ar — 1, equation (17) becomes 


(28) ^(01, 02) 


(29) 


£ 


max (-y, fli-1) 


and (18) becomes 


(0 - 01 + 1) d\(0) 

,mla (W-1. ai-l-l) 

+ (01-0+1) dx(0) = !7(0 i)/2X 


(30) f*(x)/2^r = \(mm[N - 1, a;]) - \(max[-iV, a; - 1]). 

The reader will not be confused by the use of \ as a point function here, and 
as a set function in Corollary 1. Using (30) and integration by parts (Saks [2], 
p. 102) we can rewrite (29) as 

(31) / f*(x) dx = fif(0i), 

*f6x 


which is merely the condition that f* be an unbiased estimate of g. It is clear 
from (31) that g admits an unbiased estimate if and only if it is absolutely 



UNBIASED ESTIMATES 


413 


continuous. Differentiating (31) we obtain 

(32) f*{d + 1 ) - f*{B) = gid). 
Consequently the general solution of (31) is 

m+AT 

(33) J*{e) = E g'ie - ^) + y(e), 

1-1 

where 7 is a function of period 1 such that 

(34) [ yi&) de = 0. 

Jo 


Here, contrary to the usual convention, [d] denotes the largest integer less than 6 . 
The one of (33) which minimizes the variance at 60 is determined by the condition 
that there exist X satisfying (30), Let y be any number on the half-closed interval 
{—N, —N -H 1), and sum (30) iov x = y, y 1 ■ y + 2N — 1. This yields 

. 21 V—I 

(35) ^ E f*iy + j) = m - 1) - H-N). 


Carrying out the same computation on (33) we obtain 
■, ay-i )+y 

(36) 4 E Z g'iy + i - f) + y{y) = W - D - x(-z^). 

ATS ;»»0 »«l 


Combining (34) and (35) we find that the proper choice of 7 Ls that which gives 

[rl+y+l 


1 4. i 

f*{x) = E - [a:] - AT -f i) 


»-o 


(37) 


+ ’ll -') + 


I 2ff—l 


If the limit of (37) a.sN 00 exists, it agrees with Norlund’s simplest definitions 
of the principal solution of (32) (see Milne-Thompson [3] formula (2) p. 201) 
whenever the latter is applicable. The author has not checked the agreement 
with Nbrlund’s more general definitions. 

Next we consider the problem of obtaining an unbiased estimate of g(,d) with 
minimum variance at 60 when X consists of n independent observations, each 
uniformly distributed over the interval (0, 6 ). Here 6 is an unknown positive 
number. The result is independent of the choice of 6 q . Clearly a necessary and 
sufficient condition for the existence of an unbiased estimate of g is that g be 
absolutely continuous Corollary 1 can be applied to obtain as the best unbiased 

estimate g{Y) + - (7) where 7 = max(Zi ■ ■ ■ X„).However, this result can 

be obtained much more simply by observing that, given any sufficient statistic Z, 



414 


CHAllLES STEIN 


there exists an unbiased estimate with minimum variance which is a function 
only ot Z. A proof of this is given by Blackwell [4]. But 7 is a sufficient statistic, 
and the condition that/*(y) be an unbiased estimate of g is that 

I [ f*(v)y’'-^dy == gie). 

This has as its unique solution that given above. 

A similar situation holds when theX.ji = 1 ••• n, arc independentlynormally 
distributed with unknown common mean 0 and unit variance. Here Corollary 1 
is not applicable, but Corollary 2 is. The result can again be obtained more 
simply as the unique solution of the integral equation 

^ iy - «(») 

with 

1 ” 

,x„) = ft (gj), y = ^ a;.. 

It should be observed that the methods of section 2 are applicable also to 
problems of sequential estimation. Let Xi, Xj, • • • be a sequence of real-valued 
random variables such that (Xi, • • , Xf) have the joint p.d.f. p„(xi, ■ • • , a:„ 1 0 ) 
for some unknown 6 «fl. Suppose it has been decided to terminate the procedure 
on the observation if (Xi, • • • , Xm )«Um for some given sets Rm in m space, 
and suppose these sets are so chosen that the probability of termination is 1 
for all 0 . Then we can define the space R = , the union of the Rm, the 

measure 

m(A) = X) limiARm) 

m 

for any set A (ZR for which the intersections AnRm are Borel sets, where y„ is 
ordinary wi-dimensional Lebesgue measure, and the probability density functions 

PC® 1 fl) = VmiXl Xm\ 6) if X = {Xi • ■ • Xm) t Rm ■ 

The previous results are then applicable. Most of the familiar results m the 
theory of statistical inference can be extended to sequential problems in the 
same way. Of course the interesting and difficult problems of sequential analysis 
are usually concerned chiefly with the appropriate choice of the regions Rm • 

4. Connections with the work of other authors. Many lower bounds for the 
variance of an unbiased estimate were obtained by Bhattacharyya [5], and 
some results were obtained earlier by others whose results are referred to by 
Bhattacharyya. His work has been extended to sequential problems as indicated 
in section 3 above by G. R. Seth in a doctoral dissertation at Columbia "Uni¬ 
versity. This leads to results analogous to, but in some respects more general 
than those of Wolfowitz [ 6 ]. Among other papers on sequentiail estimation. 



UNBIASED ESTIMATES 


415 


there are the one by Blackwell [4] already referred to, and the one by Girshick, 
Mosteller, and Savage [7]. These deal mainly with problems in which there is a 
unique unbiased estimate based on a sufficient statistic. 

The author is indebted to A. Wald, J L. Hodges, E. Baranlcin, and H. Rubin 
for some helpful suggestions and commente, 

REFERENCES 

[1] B. V. Sz. Nagy, “Spektraldarstellung linearer Tranaformationen des Hilbertachen 

Raumea,” Ergebniase der Malhemalik, Vol. 5, No. 5 (1942) 

[2] S. Saks, Theory of the Integral, Monografie Matematyozne, Tom VII, Warsaw, 1937. 

[3] L. M. Milnb-Thompson, The Calculus of Finite Differences. Macmillan, London, 1933 

[4] D. Blackwell, “Conditional expectation and unbiased sequential estimation,” Annals 

of Math, Stat., Vol ,18 (1947), p 105 

[5] A Bhattachahyta, “On some analogues of the amount of information and their use in 

statistical estimation,” Sankhya, Vol. 8 (1946), p 1 and Vol. 8 (1947), p 201 

[6] J WoLEOWiTZ, "The efficiency of sequential estimates and Wald’s equation for sequen¬ 

tial processes,” Annals of Math Slat , Vol 18 (1947), p. 215 

[7] M, GiEshick, F. Mosteller, andL Savage, “Unbiased estimates for certain binomial 

sampling problems,” Annals of Math. Stat , Vol 17 (1946), p 13 


1 



DISTRIBUTION OF MAXIMUM AND MINIMUM FREQUENCIES IN A 
SAMPLE DRAWN FROM A MULTINOMIAL DISTRIBUTION 

By Robert E. Greenwood and Mark 0. Glasgow 

University of Texas 


1. Introduction. In this paper, the expected values 


1113 ^ / N 


nj+Tij-t- ' nii*^N nilwjl •■n^fc! 


(^ 1 . Pi'vV--' 


■wjH be studied The quantities {m,}, i = 1 , 2 , ■ • • , k, are understood to be 
non-negative integers, and the quantities {p.) are non-negative probabilities, 
= 1. Also, I ^ k. Form (1.1) will be evaluated for the binomial case I = k 
= 2 and for the special trinomial case pi = ft with Z = 2 , /c = 3 . 


2, Binomial distribution. The evaluations for the expected values in the 
binomial case can be given explicitly in terms of the incomplete Beta function. 
This function may be defined by the relation 

( 2 , 1 ) 7,(n -k,k+l) = Z (1 - 3 )^^"-^ 

whence 

7i_, (/c -f 1, „ - fc) = t (A (1 - qY 2 "^. 

r-k+l \r/ 

It IS seen that 


(2.2) 7g(ji — fc. A; 1) "b 7i_j(fc -)- 1, n — /c) = 1. 

For the binomial case, Ui = N — ni and p 2 = 1 — pi, and thus instead of 
(ni, n^) and (pi, pz) one may use {n, N — n) and (p, 1 — p) without any sub¬ 
scripts and without sacrifice of clarity. This will be done in some instances in what 
follows. The evaluation of 

(2.3) B («., «)] - E (([) („, N - „)] p“(l - p)”-" 

is slightly different for the two cases N odd and N even. 

For N odd, and for the minimum form, the summation may be written in two 
parts, (a) and (b), 

(a) 0 g 71 ^ ^ , 

A 


416 



MAXIMUM AND MINIMUM PEEQUENCIES 


417 


in which range min (n, N — n) = n, and 

N + 1 


(b) 


g n 1 AT, 


in which range min {n, N — n) — N — n. In the (a) part summation one gets 

- (" 7 ') ^)- 

In the (b) part summation one gets 

E ^ (f) {N - nWl - p)-" = Z 

n=-(i^+l)/z \n / (Ar+l)/2 \ n / 

■Nil - p)p”(l - = Nil - p)4 

Similar algebraic manipulations, supplemented by symmetry, can be used to 
effect the evaluations tabulated below. 

For N odd there result the forms 

JS;[min (ni, th)] = Nph-p ^ ^ ^ ^ • ^ ) 

I Arn IT fN + I N - l\ 
+ Nii - p)h 

£/[max (m, Hz)] = Npip ~T“) 

^ atti it fN - I N + l\ 

For N even there result the forma 


(2.4) 


(2.5) 


£[mm (ru, nz)] = Nph-, (f. f) + 

B[max (m, nz)] = Npl, (f. f) + NH - p)h-, " 1- f + 0 ' 

For this simple binomial case, max (ni, ftz) + min (ni ,ni)=N and linearity 
in the expected value operator used in (2.3) preserves this relation, so that one 
obtains 


(2,6) E[mm (n^, nz)] + E[max (tii , riz)] = N. 

Thus (2 6) and (2 2) could have been used in evaluating some of the forms above, 
or can be used as a check on the evaluations., 



418 


ROBEET E. GREENWOOD AND MARK 0. GLASGOW 


To compute the variance 

(2.7) Ax) = = E[A - [E[x]}\ 

it will be convenient to note that for the binomial case 

(2.8) ^max ^ CTbq!!! 

where 

<“> 'ra = ^ - {® [r 

and where because of the non-negative character of Ui and n 2 

To prove (2.8), note that for this binomial case 

{max(ni, tij) — i?[max(7ii, nj)])* = lmin(7ii, rh) — J5?[min(ni, 712 )]}“, 

and thus each term for vLx has its counterpart for o-^n when using the first part 
of (2.7) to compute these variances, and hence (2.8) must be true. 

Defining a as the common value, one gets 
o 2 2 I 2 

ZC — CTqqpix *T ^min 

(2.10) = £![max (n?, nl)] + i?[min (nl , n?)] — (j5[max (wi, ns)])" 

— {E[min (ni, Tia)])’, 

The value of the sum 

i^[max {nl, n*)] -f £l[min {nl , n\)\ 

is somewhat easier to obtain than that of either part. For, max (n?, 712 ) is one 
of the mtegers {nl , 712 ) and min (tii , ti^) is the other integer. Linearity in the 
expected value form then gives 

®[max ( 71 ’ , 712 )] -f- £([min (tii , ti*)] = E[r^ -h (iV — 7i)“] 

= fVV -f 2iVp(l - p) 4- N\l - V)\ 

a relation which is similar to (2.6). 

Likewise one gets 


( 2 . 12 ) 


{E[max (tii , 712 )]}® + {fi[min (ni, 712 )])” 

= {£[max (ni, 712 )] + .B[min (nj, 712 )])“ 

—2F[max (721 , 7i2)].®[min {ni , 712 )] 
= — 2F[max (m , 7i2)]£'[min (tii , tij)]. 



MAXIMUM AND MINIMUM FREQUENCIEa 


419 


Substituting the results of (2.11) and (2.12) into (2.10), and solving for a one 
gets 

V = E[max (ni, 7i2)]S[min (nj yih)] - N{N - l)p(l - p) 

(2.13) = JS;[max (ni, 712 )]{AT - E[iiiax (% , na)]} - NiN - l)p(l - p) 

= E[mm (m , n2)]{A^ - £;[min (ni, na)]} - N{N - l)p(l - p). 

If one desires, one can make independent evaluations of £?[max (n?, wa)] and 
E[min {n\ , ti®)] and compute the variances from relation (2 9). Such evaluations 
bring into play the incomplete Beta functions at four different sets of values, 
with separate sets for N odd and N even Relations (2 13) seem preferable to 
this suggested “strong-arm” procedure. A proof of relation (2.8) by this means 
seems to be unduly algebraically complicated. 


3. Normal approximation to the binomial distribution. If numerical values for 
large N are desired (beyond the range of tabulated values of the incomplete 
Beta Function) an approximation based on the normal distribution may be used. 
Let 


(3.1) 


ni = Api + X, 

712 = AT — ni = Ar(l — pi) — X, 


where the subscripts may be dropped when not needed for clarity 
Then one haa 


(3.2) 


^ 1. 


V "71 


(2Twrb)) 


To evaluate the minimum approximation, note that there are two ranges 


(a) 


< X < ^ - Np, 


in which range min (x -f- Np, Nil — p) — x) = x + Np, 

(b) ^ - ATp < X < 00 , 

in which range min (x + Np, Nil — p) — x) = N{I — p) — x. Defining 




420 


HOBERT E. GREENWOOD AND MARK 0. GLASGOW 


a tabulated function, the integrations may be evaluated as 


J^fmin (ni, tij)] = NpA(M) + N(1 — p)[l — AiM)] 


(3.4) 



2Np{l — p) 

TT 


exp 


-iV(l - 

8p(l - p) _ 


E[inax («!, nj)] ^ .V(l — p)A{M) + A^p[l - A{M)] 


, l2Np(l - p) r -ATd - 2pr 

\ IT ' ^ L 8p(l — p) J ’ 

where 

VNpd - p) • 

Note also that (2.6) holds for these approximate evaluations 
For the variance, approximations (3.4) may be used in relations (2.13). Or, 
alternately, the variances may be computed by “strong-arm” methods usin g 
the definition (2.9). In this cose, using the averaging defined implicitly by (2.10) 
one gets the evaluation 


<r= S Nl4(fl/)[1 - .4(M)][1 - 2p]® + Npil - p) 


(3.5) 




iV(l - 2p)[l - 2A{M)] - P) exp 


2Np(l - p) 


exp 


" -rY(l - 
. 4p(l - 


2p) 


2n 


V) 


' -Nil - 2pY ' 
_ 8p(l - p) . 


It would seem preferable, to use relations (2 13) rather than the above, for that 
reason the evaluation of forms (2.9) have not been included here. 


4. Trinomial distributions. The form 


( 41 ) 


E 


max 

min 


(«! , R2) 


N I 

ni+n2+n3=.iV Rl! 71-2 ! U3 1 


max 

min 


(ri , ri2) pi"‘ pV Pi^ 


may be approximated, for large Y, by the bivariate normal distribution. Sup¬ 
pose two attributes P (and not P — P) and E (and not R = R) are being ob- 
seiwed in a distribution Then the four possible outcomes of an experiment 
could be represented as the categories PR, PR, PR, PR with respective probabil¬ 
ities a,h, c, d] a + h A c + d = 1. In such a situation, for large N, one may use 
a bivariate normal distribution as a limiting form of the above described bi¬ 
variate binomial distribution, or multinomial distribution with four categories 

If the probability of one category, say PR, is zero, the bivariate normal 
distribution can be regarded as a limiting form of a trinomial distribution. 

Indeed, defining 


(4 2) 


Ri — N pi 
[Yp.(l - ’ 


th — Ypa 
[Yp.d - P2)]’ ’ 


Xi = 



MAXIMUM AND MINIMUM FREQUENCIES 


421 


the bivariate normal distribution takes the form [1] 

3 r 1 


(4.3) dF = 2^(1 exp 

where 


2(1 - r") 


(a:; — 2 rxiX 2 + xl) ^ dxi dxz, 


— w < Xi,Xi < 00, 


.(1 - 


PiPs 


-|1 


Pl)(l - P2)J ■ 
The expected values are then given approximately by 


(4.4) 


E 


max 

min 


(ni 


*'~oO J—^ _I111I1 _ 


For the special case pi = pa, evaluations have been made of ( Ri , 112 )] 
by the authors. For the finite summation (4.1), powers of N less than the one- 
half power were neglected, and the values 

F[min (ni, lii)] = Np — > 

J?[max (ni, ns)] = Np + ( " ) 

were obtained. 

For the integral case, again for pi — Pi = p and hence for r — — p/(l ~ p)) 
the evaluation proceeds as follow's In virtue of (4.2) and (4 3) 


(4 6) 


jF[min (ni, n^)] = Np [iVp(l — p)]^ [ f [™ii } 2 : 2 )] dF 

J—tO *^-80 

= Np + lNp(l - p)]* f f [min (xi - xj, 0)] c 

•'—Eft 00 


It IS convenient to introduce a rotation of axes in order to evaluate integral 
(4 6). Indeed, rotation through 7r/4 radians will give 


(4.7) 

■with 

(4.8) 

(4.9) 


Pi _P2_ 

~ V 2 \/2 ’ 

Jl , Jl- 

V 2 ^ V 2 ’ 


X? + , X 1 X 2 + xl 


~ (nb) C^w)' 


1 - p 

min (xi — X 2 ,0) = min ( — ^ 2 ^/ 2 , 0), 

3 (xi 1X2) _ 1 



422 


EGBERT E. GREENWOOD AND MARK 0. GLASGOW 


Thus integral (4.6) becomes 


j5'[inin (ni, nj)] 

= jvp + - p)'"” 


L 1 - 2p 
2 


r r 

2ir J-a, J-o 


(4.11) 


[" I (r^ + ^2/1 


dyi 


= Np + 


[NpO- - p)^j‘ 1 j°° 

r_ - ^Lr 
L 21 - 


L 2(1 - 2p) 

■if -2/2 exp 


2p 


2/1 - 


(1 — p) ,2 


2/2 dyi> dy 


As indicated above, it is convenient to consider the form as an iterated integral, 
and integrate first with respect to yt. The evaluation of (4.11) presents no seri¬ 
ous difficulties, 


^/[niin (rii, rh)] 



Np(l — p) 
2(1 - 2p) 


-i 


1 

7r(l — p) 



(4.12) 


Likewise 




(1 - p) 

2(1 - 2p) 



dyi 


£[max (ni, n. 2 )] = Np + 



Note that these values are the same as those obtained from the finite summation 
form (4.1), as given by (4.5). 

To evaluate the variance 


(413) 


a finite summation form similar to (4.1) or an integral form similar to (4.4) may 
be used. 

In case the integral form is used, it is convenient to introduce the variables 
Xi and Xi as defined by (4.2). One then gets 

A;[min (nl, nl)] = NV + Np(l — p) 


(4.14) 


■E[uan(xl + 2 [A J ; a;? + 2 ^ij\ 

= NV + Np{l — p) + Npil — p) 


■E 




{xx - 3 : 2 ); 0 




MAXIMUM AND MINIMUM JBEQUENCIES 


423 


in -which one mtegratioi over the whole space has been earned out. Rotating 
axes as per (4.7) one gets 


(4.15) 


E[mm (nl , nl)] = NY + Npil - p) + 2i\rp(l - p) 




In evaluating this last expected value form, the region of integration may be con¬ 
sidered as a sum of separate regions. Over some regions the integrand is zero, 
in other regions the non-negative product 



is the integrand and this condition gives 



as the regions of integration with the non-negative product as integrand. 

Since the assumption that N is large has already been made, it is convenient 
to approximate further here and assume [2iVp/(l — p)]* is large, and in particular 
to assume that integration from — [22Vp/ (1 — p)]* to -t-» is equal to integra¬ 
tion from — 00 to -f- 00 for the integrand under consideration and for iterated 
integration with respect to the variable j/i. 

Remark: An equivalent assumption is needed in the finite summation case 
when approximating (Np) 1 by the use of Stirling’s formula. 

Thus one gets (since one of the above regions of mtegration is to be neglected) 


E ["min 




(4.16) 


• exp i 1/1 - o i/»l 1 <^1/1 


-1 


2Tr(l - 2p)i 

_ _/^Y 

1 — P \ TT / 


,2(1 - 2p) " - 2 


Collecting results from (4.13), (4.15) and (4.16) one obtains 
(4.17) v,^i.SfV'p(l-p-;^). 


By a similar procedure, one may compute also that 
(4.18) aL.^Np(l - p-^. 



424 


ROBERT E. GREENWOOD AND MARK 0. GLASGOW 


For this three category case, the proof used to obtain relation (2 8) is no longer 
applicable, yet the relation o-mu = still holds for the approximating rela¬ 
tions given above 

6 . Conclusion. Since the normal distribution was used in some instances to 
obtain approximations for the binomial and multinomial distributions, many 
of the maximum and minimum relations stated as approximations for the multi¬ 
nomial are exact for the appropriate normal distribution. 

No convenient formulation was found for the general trinomial case (pi, 
P 2 , pi unequal) similar to relations (4.5), (4.17), and (4.18). 

As possible applications of the general solution of this problem, the referee 
has kindly supplied the authors with a reference of Guttman [2]. Sampling 
theory provided by the general solution to this problem could be used in connec¬ 
tion with Guttman’s reliability coefficient. 

REFERENCES 

[1] M. G Kendall, The Advanced Theory of Statistics, Vol I, 3rd edition, Charles GrifRii 
and Co., 1947, p 133. 

[2J Louis Guttman, “The tcat-reteat reliability of qualitative data," Psychomelrika, Vol. 
11 (1946), pp 81-95 



DERIVATION OF A BROAD CLASS OF CONSISTENT ESTIMATES 

By R. C. Davis 

U, S, Naval Ordnance Test Station, Inyokem, California 

1. Summary. Given, a chance vector X with distribution function F(K, 6 r), 
where 6 r denotes the tme unknown parameter vector, a broad class of estimates 
of 6 t is derived which is shown to be identical with the class of all consistent 
estimates of 6 r . A sub-class is obtained each member of which has the foEowing 
properties: a) Its construction depends upon the solution of an equation in¬ 
volving a single vector function of the parameter vector 0 and the members of 
a sequence {X„} of independent and identically distributed chance vectors; 
b ) the estimate so obtained converges almost certainly to 6 r , o.) it is a symmet¬ 
ric function of the members of the sequence {X„). In order to obtain this sub¬ 
class it IS postulated that a function of X and 6 exists (continuous m 6 for a 
certain neighborhood of the true parameter 6 r and existing for each X in a sub¬ 
set of the sample space) which satisfies a Lipschitz condition in 6 . In particular 
if a density function /(X, 6 r) exists satisfying certain conditions, the consistency 
of the maximum likelihood estimate can be established under regularity condi¬ 
tions quite different from those usually assumed [1] This is not to be interpreted 
as a weakening of the usual regularity conditions but rather as an extension of 
the class of consistent likelihood estimates obtained under the usual regularity 
conditions 

2. Introduction. The present work is the result of investigations into the 
following question posed by J. Neyman: What happens to the asymptotic 
properties of the maximum likelihood estimate of 67 . when the usual regularity 
conditions on F(X, 6 ) are relaxed? The consistency and efficiency of the esti¬ 
mate are the properties in question, and the present work arose from the ob¬ 
servation that consistency at least can be obtained under Conditions much dif¬ 
ferent than those usually assumed [1]. The assumptions made below are exis¬ 
tential in nature, and no general methods are given for the actual construction 
of consistent estimates. As stated above, however, the results of this work can 
be used to widen the class of consistent maximum likelihood estimates established 
heretofore. Although simple upper and lower bounds for the variance of a con¬ 
sistent estimate are obtained, no answer is given to the question of determining 
the efficiency of such an estimate. In regard to consistent estimates, J. Neyman 
and E Scott have discussed recently [ 2 ] the need for a systematic method of 
obtaining consistent estimates. Wald has given necessary and sufficient condi¬ 
tions [ 3 ] for the existence of a uniformly consistent estimate of an unknown pa¬ 
rameter 0 when there exists a density function continuous jointly in all of its 
arguments, and it is assumed that the domain of each ofthe unknown parameters 
is a closed and bounded set. It is hoped that the class of consistent estimates 

425 



426 


E. C. DAVIS 


derived below will help shed some light on a general method for actually ob¬ 
taining such estimates. In this connection it is important to point out that if 
necessary and sufficient conditions were known for the existence and uniqueness 
of a fixed point for a transformation on E„ to , the weakest possible conditions 
could be expressed for the existence of consistent estimates obtained in the 
manner giyen below. It is surmised that the use of a Holder condition of order 
one as presented below is stronger than required. 

Let (X,), i = 1, 2, • • • , n, ■ • , be a sequence of chance vectors in which 
X, possesses the probability distribution function i^.(X, 0) depending upon an 
unknown parameter vector 6. The vector X has components X, ,i = 1,2, ■ ■ ,s, 
where is a chance variable, and 0 has components 5/, j = 1, 2, • ■ ,m. The 
problem is to obtain a function of the X,- which is a consistent estimate of 6 
We denote by E, the real Euclidean space of a dimensions and by E', a subset 
of E, excluding at most a set of probability measure zero. For convenience we 
use the symbol [[ 0 1| to denote the norm of 0, where 

1|0|1 = + ••• + 

We define in a similar manner the norm of any function which assumes values 
in Em ■ The following assumption is made: 

Assumption 1. There exists a point 0o and a neighborhood W(0o, a) o/ 0o having 
radius a (a > 0) which contains the true parameter vector 0r as an interior point 
and there exists an infinite sequence of functions Gn(Xi, Xj, ■ • • , X„ ; 0), n = 
1,2, ' ■ • , ad inf. on E, X Em to Em ouch that 

(a) for each n the equation 

Gn(Xi, Xj, • • • , X„ ; 0) = 0 

has a unique solution 0 = 0llI(Xi, Xj, • • • , X„) in W(0o, a). (For the sake of 
brevity we usually write G„(X; 0) = G„(Xi, Xj, • • • , X„ ; 0).) 

(b) For every pair of values o/ 0; , 02 m 1^(00 , a) and for some K with 0 < iC < 1 

lim P{1|(?„(X, 0) - (?4X, 02) - (0, - 02)11 gXllOi - O 2 III 1. 

n—►flO 

(c) For every e > 0, 

lim P{11(?„(X, 0,.)|| <*] = !. 

3. A consistent estimate of 02<. 

Theoeem 3.1. The solution 0 = 0JI(Xi, X 2 , • • ■ , Xn) of the equation 
(?n(Xi, X 2 , ••• , X„j0) = 0 

is a consistent estimate of 0t , providing Gn(X; 0) satisfies Assumption 1. 

Phooe: From Assumption lb it follows that given i > 0, we have for all 
n > N'id), 

(3.1) P{\\ G„(x, 0.) - (0, - 0!) 11 X 11 e, - e; 111 > 1 - I, 



CLASS OF CONSISTENT ESTIMATES 


427 


since G„(X, ej) = 0. It follows from (3.1) that for all n > iV'(5), 


(3.2) 


Gn(X, 6r) 

1 + K 


a M Or 


- 6*11 S 


G„(X, 0^) 
1 - K 


>'-i- 


From Assumption Ic it follows that there exists N"{t, 5) such that n > N"((, 5) 
implies 


(3.3) P{\\G.(X,%r)\\<<l-K)] >1-1 


(3.2), (3.3), and a familiar formula in probability imply for all 

n > max [AT'(6), N"{t, 5)], 

P{ll er - 11 < > 1 - S. 

It IS noted that (3.2) characterizes the speed of convergence of the estimate 
ej . The following uniqueness property is noted; If a given sequence of functions 
G®n(Xi, Xa, ■ • ■ , X„ ; 6) satisfies Assumption 1, then 6r is the unique parameter 
vector m TF(6o, a) which satisfies item c of Assumption 1. The proof of this remark 
is left to the reader. 

The following remark demonstrates the extreme generality of the class of 
consistent estimates obtained in the above manner: The set of estimates of the 
parameter vector By obtained from the class of all sequences of functions 

(Tn(Xi, Xa, • • , X„ ; 6) 


satisfying Assumption 1 is identical with the set of all consistent estimates of the 
parameter vector 6r . The proof of this remark is quite obvious and is left to the 
reader. 


4. Properties of a sub-class of consistent estimates. The question arises 
naturally concerning a general method for the construction of a sequence of 
functions (?„(Xi, Xa, ■ ■ , X„ , 6 ) satisfying Assumption 1. The author knows 
of no general method. It is possible to describe a sub-class of the class of con¬ 
sistent estimates, the construction of which depends upon the existence of one 
function rather than a sequence of functions. This is possible by application 
of the strong law of large numbers, and in this way consistent estimates of the 
parameter vector are obtamed which converge almost certainly to the true 
value 67 . Moreover it is clear that under certain conditions the function 

(jn(Xi , X 2 , • • • , Xn ; 6 r) 

defined as in equation 4.1 below is an asymptotically wi-variate normal variable 
Assumption 2 . Let jX,), i = 1,2, • • • , n, • • ,he a sequence of independently 
and identically distributed chance vectors with common distribution function /'’(X; 0 ), 
where 0 is again the unknown parameter vector. 

Assumption 3. There exists a function Sf(X, 0) on E, X Em to Em such that 
(a) for every X t E', and every distinct pair (0i, O 2 ) in pr(Oo , a), 

II ^(X, 00 - g(X, O 2 ) - (01 - O 2 ) II ^ if II 01 - 02II, 



428 


R. C. DAVIS 


lohere 0 < K < 1 and || (/(X, 0 d) H < (1 — K)a. 

/ Oti 

ff(X, e,.) dFiX, Or) = 0. 

We define the function Gn(X, 0 ) as follows: 

(41) ff„(X.O) = li:^(X.. 6 ). 

Tl i“] 

The following lemmas are required: 

Lemma 4.1, Gn{X, 0 ) as defined in (4.1) satisfies the conditions in Assumption 
3 with (j„(X, 0 ) replacing g{X, 0 ). 

The proof is .sufficiently obvious to be omitted 
Lemma 4.2 <?„(X, O 7 .) —> 0 almost certainly as n—r ^, ij Assumptions 2 and 
3b hold 

Proof" Since Eg{Xi , Or) = 0, t = 1 , 2 , • • • , n, and the chance variables 
^(X,, Or) are independently and identically distributed, this follows immediately 
from a theorem due to Kolmogorov [5]. 

Theorem 4 1.7/ Assumptions 2 and 3 hold, then the equation (?n(X, 0) = 0 
has a unique solution 0 = ot(Xi , X;, • • • , X„) m W( 0 o , a), where 0 * is a con¬ 
sistent estimate of 0 r and is moreover a symmetric function of the observation vec¬ 
tors Xi, Xj, • • • , X„ 

Proof. Wc obtain the solution 6 * by the method of successive substitutions 
Define 

0i = Oo — ffrt(X, Oo), • • , 0,+i = O 5 — G„{X, Oj). 

In view of Lemma 4 1 we can apply a well known existence theorem [4] in the 
theory of functions to prove that the sequence (6,1 converges to a limit 0 * which 
IS also in W (Oo, a). The same theorem establishes the uniqueness of the solution 
in TT(Oo, a) This uniqueness property together with lemmas 4.1 and 4.2 estab¬ 
lish the fact that the sequence {Gn(X, 0)} as defined in equation (4.1) satisfies 
Assumption 1 It follows immediately from Theorem 3 1 that 0 * is a consistent 
estimate of Or We can, however, prove a stronger relationship 
Theorem 4.2 The estimate 6* defined in Theorem 4.1 converges almost certainly 
to 6 t ■ 

Proof: From Lemma 4 2 we know that given any number e > 0, there exists 
an integer N{e) such that for all n > N{e) 

P|||(?.(X,0,)|1<*(1-K)) = 1 . 

From Assumption 3a and Lemma 4 1 we see that 

II G„(X, 0 ^) - (Or - at) II ^ K II Or - 0 * II, 

since Gn(X, oj) = 0 Then 

II G„(X, Or) II ^ (1 - K) II Or - 0 : 1|. 



GLASS OF CONSISTENT ESTIMATES 


429 


Clearly the set of X e E, for which 1| 0^ - 6* j] < « includes the set of X for 
which II (?„(X, dr) II < f(l - K). 

Therefore, for n > i\/'(e), 

Pill Or - ft^ll < e} SP{|1G„(X, 6r)l| < *(1 - K)} = 1, 
and the proof is completed, 

The uniqueness of the parameter value 6r in the neighborhood lT(Oo, a) 
follows immediately from the remark succeeding Theorem 3.1 since Assumption 
1 is valid in Theorems 4.1 and 4.2. 

It is interesting to note that the application of a theorem in the theory 
of functions of a real variable gives the result that if the function g(X, 6) is 
continuous on a bounded and closed set in E, X Em and if we take for E, a 
bounded and closed set, then 6*(Xi, Xj, • • • , X») is a continuous function of 
Xi , Xr,■ ■ •, X„ for X, € P, (f = 1, 2, • • , n). If we assume the continuity of 
g(X, 6) in X for each 6 in W(6o, a) the following remark demonstrates an inter¬ 
esting relationship concerning the uniqueness of the solution for 6 in the equa¬ 
tion Eg{X, a) = 0.7/ in addiiion io Assumption 3 we assume that g{X, 6) is 
continuous in X for every X in E, and every 6 in TF(6o, a) and if at least one 
of the components ff,(X, 6), 1 ^ i ^ m of the m-dimensional vector functim 
g{X, 6) satisfies also a Lipschitz condition: 

11 (7.(X, 60 - gfX, 60 - (6i - 60 H S if H Oi - 6^ 1| 

for every distinct pair di , dt in W (Oo, a), then for all d in W (6o , a), di is the unique 
solution for 6 of the equation Eg{X, 6) = 0. 

The proof of this remark is left to the reader. 

6, Upper and lower bounds for the expected squared error of 6*(Xi, Xj, ■ i 
X„). Denote by gfX, d),i = 1,2, , m, the m components of the chance vector 

g(X, 6). We now make an additional assumption. 

Assumption 4. 

6r)ffOX, 60] = X,, 


exists for i = 1,2, • • , m and j = 1,2, ,m 
It follows from Assumptions 2,3b, and 4 and the Lindeberg-L6vy form of the 
Central Limit Theorem that the vector ViiG„(X, 60 tends in probability to an 
m-variate normal distribution with means zero and moment matrix (Xy). 

Now fyom Assumption 3a and Lemma 4.1 


(5.1) 


g II6: - eOl ^ 


X 



For convenience define 



430 


E. C. DAVIS 


We obtain then 

fi:||(?„(X,er)ir= 

71 

It follows then from equation (5.1) that 




6 . The consistency of maximum likelihood estimates. The results of this 
paper can be used to extend the class of consistent maximum likelihood estimates 
established heretofore [1].^ Assume that F(X, 6) admits a density function 
/(X, 0) with the property 

^3£y(x,e)dx = £|(X, e)dx: 

Then 

The maximum likelihood estimate of Oj- is obtained by solving the equation 

^lnL(X,9) =0, 

where 

Mx,e) -n/(x.,e). 

If a sample Xi, Xz, ■ • • , X„ is obtained as the result of n random independ¬ 
ent drawmgs from the distribution having the c.d.f. F{%, 6), the sample values 
will satisfy Assumption 2. Assumption 3b holds as assumed above If we assume 
also that the function 3/36 ln/(X, 0) satisfies Assumption 3a, it follows directly 
from Theorem 4.2 that the maximum likelihood estimate converges almost 
certainly to the true parameter vector as the sample size approaches infinity. 

The author wishes to acknowledge his indebtedness and gratitude to Professor 
.Terzy Neyman for the many helpful suggestions made during the preparation 
of the paper. 

HEFERENCES 

[1] J, L. Doob, “Probability and statiatios," Trans, Am. Math. Soc., Vol, 36 (1934), p. 769. 

[2] J Neyman and Elizabeth L. Scott, "ConBislent estimates based on partially con¬ 

sistent observations,” Economeinca, Vol 16 (1948), pp. 1-32 

[3] A. Wald, “Estimation of a parameter when the number of unknown parameters in¬ 

creases indefinitely with the number of observations,” AnnoZs of Math,. Slat., 
Vol 19 (1948), pp, 220-227. 

‘Recently Wald [6] and Wolfowitz [7] have discussed the consistency of the maximum 
likelihood estimate from another approach than the one employed by Doob. 



CLASS OF CONSISTENT ESTIMATES 


431 


[4] L. M Gbaves, The Theory oj Funcltons of Real Variables, McGraw-Hill Book Co , 1946, 

[5] A Kolmogobofe, Orundbegnffe der Wahrscheinlichkeiisrechnung, Chelsea Publishing 

Co , 1946 

[6] A Wald, “Note on the consistency of the maximum likelihood estimate,’’ Annals of 

Malh Slat , Vol. 20 (1949), pp. 695-600 

[7] J- WoLFOWiTZ, “On Wald’s proof of the consistency of the maximum likelihood esti¬ 

mate,” Annals of Math. Sial,, Vol 20 (1949), pp. 601-602 



DISTRIBUTION OF THE SUM OF ROOTS OF A DETERMINANTAL 
EQUATION UNDER A CERTAIN CONDITION 

By D. N. Nanda 
University of North Carolina 

1. Summary. This paper is in continuation of the author’s first two papers 
[ 1 ] and [2] In this paper a method is described by wfiich it is possible to derive 
the distribution of the sum of roots of a certain determinantal equation under the 
condition that m = 0, This condition implies, when the results are applied to 
canonical correlations, that the numbeis of variates in the two sets differ by 
unity. The distributions for the sura of roots under this condition have been 
obtained for I = 2, 3 and 4 and arc Riven in this paper This paper also derives 
the moments of these distributions. 

2. Introduction, The reader should refer to the first two papers of this series 
[ 1 ] and [ 2 ] for detailed explanation of tlie preliminaries essential for this paper. 

The distribution of any root of the determinantal equation, specified by its 
rank when the loots are arranged in a descending order of magnitude, was 
derived by tho author [ 1 ] The distribution of the largest root was expressed as 

(1) < a:) = C{1, m, n)F,,„,,„(a;) == const, {0,1,1 - I, ■ • , 1, a;; m, n). 

3. Method. Putting d, = p,/n in R{1, m, n) as given in [1] and allowing n to 
tend to infinity, tlie distribution density reduces to 

R{1, m) = const. Up”,' n {pi - (0 < p, < pj_i < • ■ • < pi < °o), 

»<J 

where the constant is independent of n, by [2]. If we replace x by x/n in the 
right-hand side of ( 1 ) and allow n to tend to infinity, then the resulting function 
Gi,m{x) is independent of n and it can be shown by comparing the two methods 
A and B in [2], that 

( 2 ) f R{l,m)lldp, = 

This is a constant multiple of 

(3) <i>{x, m) = f np? II (p,. - p,)e-'='’' n dp, 

= const. m). 

Putting p, = xyi , we have 

W f Uy? II {y, — n dy, = const. d{x, m) 


432 



SUM OF ROOTS 


433 


The left-hand side is proportional to the moment generating function for the 
sum of roots when n = 0. 

Let 2/1 = 1 — , 2/2 = 1 — fli-i, • • • , 2/1 = 1 — 01 ; then (4) gives 

(5) [ n(l — 0.)” H (0, — 0j)e"*'’*'**“'n d0, = const. 0(aj, m). 

Let m be changed to n and both sides be multiplied by , then we get 

(6) f n(l - 0.)" n (0. - 0,)e*^'‘ nd0. = const e'*0(a;, n). 

The left-hand side of (6) is the moment generating function for the sum of roots 
when m = 0. 

The method for obtaining the probability distributions is described m detail 
for each of the cases Z = 2, 3, in the following sections. 

It may, however, be added here that the condition m = 0, implies that 
I p _ g ] = 1 in the case of canonical correlations. It also implies, in generalized 
analysis of variance, that if we have K samples and measurements are made on p 
characters then K — 1 and p should differ by unity. Thus the distribution is 
given for 5 samples and 3 characters when Z = 3 (p = 3). 


4. Distribution of the sum of roots when m = 0. 

(a) Z = 2. The value of (t 2 ,«(x) has been given in [2] as 


(7) 


<72,mW = fc(2, m) 


2 f u' 

Jo 


.^m+1 e~ 


u" e “ (ZitJ, 


where K(2, m) = 2^’"'^7r(2m + 2). Then in the notation just given 
<f,(x, = 2 jT jT 

Replacing u by xu, we get 


(2)(x, m) = 2x' 


/' 

Jo 


.2^+2 I y2™+l 


f 

Jo 


. m —XU , 

u e du 


( 8 ) 


Hence 


X 


gm-4>2 -1 2mi-» —X /»* 

1 Jo m + 1 Jo 

2m+3 p .1 d ' 

?_ 2 f e-^ / 

i + 1 L •'0 


2m+2 —X fl 


m + 


m + 


0(x, m) = const. j^2 J e du — e u"'*'^ e duj, 



43i 


D. N, NANDA 


and according to (6), 


[ n(i 

- 0{r(Bi - 

92)e’‘^^'deidB2 






(9) 

const. e“* 1^2 

r u^"^ du - 

Jq 

1 u e du 


const. ^2 

(1 - - J 

(1 - 

by replacing u by 1 

— u. Or, 



= const. [2 f (1 - 
L Jo 

- uY’^^ e^'‘du - t (1 
Jo 

- uY+^e^du 


The constant can be evaluated by putting 1 = 0. 

Then let Pr(0i + < Z) = const. [Fl(Z) + FtiZ)], where Fi(Z) and Fi(Z) are 

cumulative distribution functions given by integrating the density (1 — of 
2u and (1 — of u, respectively. It is easily seen that 

Fa(Z) = r (1 - uy-^^du =[!-(!- Zy+‘]/(n +2) (Z < 1). 

Jo 

Since Fi(,Z) is to be obtained from the density of 2u, we may substitute v = 2u 
and then integrate, Thus 

F,iz) = 2 I f 1 -y dv/2 - 2[1 - (1 - Z(2f^^^]/{2n + 3) (Z < 2). 
Hence the result for J = 2 is 

Pr(0i + flj < Z) = 2(71 + 2)[1 - (1 - Z/2f"+^] - (2n + 3)[1 - (1 - Z)"+“] 

(0 < Z < 1), 

= 2(71 4- 2)[1 - (1 - Z/Zf"^] - (271 + 3) (1 < Z < 2). 

(b) 1 = 3. The value of as given in [2] is changed as 

0,,„{x) = K{Z,m) 1^2 jT du ^ du - 2 du 

I 6-"“ du - 2a;*”’+’ / du - 

O' 771 + 1 L •'0 

using (8). 2iC(3,77i) is a constant independent of n. Putting xu for u in only the 
first two terms of the right-hand side of the above equation, we get 



SUM OF ROOTS 


435 


(?3,«.(a:) = k{3, 12 jf' du u”‘e^'‘ du 

- 2 f du [' t 

•'0 Jo m + 1 ^ 


^2«+2g-2x„ 


- e^“dw 

1 Jo 


By integrating by parts we get as a common factor on the right-hand 
side of the above equation. Then according to (5) and (6) we have 

f ^yT n (yi — y])e~'‘^'''ll dy, = const. ( 2(m + 2) 

■'0 <y8<ua<i/i<l i<3 I 


^2».+3g-S.u 


[' du + 2(2m -t- 3)e^ t 

Jo Jo 




- 4(m + 2)e-* jf’ du -|- e"'" £ u”‘+' 6"*“ dw |. 

Putting yi = 1 — 02, y 2 = 1 — 6i, ya = 1 — di and, changing m to n and 
multiplying with c^* we get 

f n(i-0.)"n (e. - 

(11) = const. |2(n + 2) £ du j[' 14 ’*+'du 

-I- 2(2n -b 3) t dw - 4(n + 2) f 

Jo Jo 

+ w"+^ d«|. 

Thus we have 

Pt(0i + 02 + 03 < Z) = const. {Fi(2) + Ft{Z) -|- Fa^Z) 4- Fi[Z)], 

where Fj(Z), F^iZ), F^iZ) and Fi{Z) are the contributions to the cumulative 
distribution by the four terms of the right-hand side of the following equation 


£(e*^‘’’) = const. |2(n + 2) jf (1 - du (1 - du 

-b 2(2u + 3) j[' (1 - du - 4(u -b 2) (1 - u)”"+“e“'' 

+ f a- u)’’+“e^“du|, 


where const. = [(n -f 2)(n + 3)(2u + 6)]. Proceeding according to the method 
given in (a) we have 

(12) F,(Z) = [1 - (1 - Z)"+V(« + 3) (0 < Z < 1), 



436 

D. N. NANDA 


(13) 

(14) 

FiiZ) = 2(2n + 3)[1 - (1 - Z/2)'"+V(2n + 5) 
F 3 (Z) = -4(n + 2)[1 - (1 - ZI2f”^\K2n + 4) 

(0 < Z < 2), 
(0 < Z < 2). 


Let us now consider Fi{Z), which is the contribution of the first term. Let 
yi and be distributed between 0 and 1 with densities (1 — and (I — 

Y. 



Fia. 1 


respectively, then 

F,{Z) = 2(n 4- 2) // (1 - - y,y^^ dyx dy^, 

2i/1+»5 ^ 2 

where Z goes from 0 to 3. 

Let us consider the distribution over the unit square OABC, Fig. 1, then for 
Z < 1,Z < 2, and Z <S ]Vfe have to integrate over OLM, OCNP, and OCQRA, 
where LM, NP and QB are the three lines given by 2yi + yt < Z according as 
Z <1,Z <2, &ndZ <Z. 

(i) The integration, over OLM is given below 

Fi.iiZ) = 2(2n + 2) jj (1 - - 3 / 2 )"+' dyi dy, for Z < 1, 

SVl+l/sSZ 

or 

— 2 —J l^t/(3-z)(2n + 4, n + 3) 


~ F( 2 -z)/( 3 -z)( 2 n + 4, n + 3)] 



SUM OP BOOTS 


437 


where 

X = B(2n + 4, n + 3) - t - y)"-^ dy 

•'0 

and 

.2/(3-Z> 

X/ 2 /(j- 2 ) = / dy. 

Jo 

(ii) The integration over OCNP is given below. 

(16) Fi,,{Z) = [1 - (1 - Z/2)^"+'‘]/(n + 2)(2n + 4) - 2'''^*[(3 - Z)/2]^”‘+“' 
(5(2n + 4, n 4- 3) — XJ(a_z)/(i_z)(2n + 4, n -1- 3)}/(r!, + 2) (Z < 2). 

(iii) In order to integrate over OCQRA, we shall integrate over the unit area 
OCBA and subtract from this the value obtained by integrating over QRB. 
Thus, 

(17) FM = l/(n + 2)(2n + 4) - 2"+*[(3 - Z)/2]‘’"^> 

B{2n 4“ 4, 4- 3)/(fi 4" 2), 

Hence the result for Z = 3 can be expressed as 

Pt{8i + 02 + 8i < Z) = const. {Fi,i{Z) + F^iZ) 4- Fz{Z) 4- Fi{Z)] 

= const. {2(n 4- 2) {[1 - (1 - Zj^Y^'^y^n + 2)(2« + 4) 
-X-2"+’[^’^"V2f"^[72/(3-z)(2n + 4,n 4- 3) 

— /(2-z)/(3-z)(2n 4- 4, n 4- 3)]/(n 4- 2 )} 
4- 2(2n 4- 3)[1 - (1 - Z/2)^’'+V(2n 4- 5) - 2[1 - (1 - Z/2)“"'-‘l 

+ [1 - (1 - Z)”+V(n + 3)1 (0 < Z < 1), 

and 

= const. {fi, 2 (Z) + Fz{Z') 4" Fzip) + ^ 4 ( 1 )) 

= const. |^2(n + 2) [1 - (1 - Z/2)'"+V(’i + 2)(2 ti + 4) 

_ 2"+^^?^^y"+®[B(2n+ i,n+ 3)- XZ(2_z)/ca-z)(27i + 4, n+ 3)]/(n + 2)J 

+ 2(2n + 3)[1 - (1 - Z/2)'"+V(2n + 5) - 2[1 - (1 - Z/2)''*+*] + l/n+sj 


(1 < Z < 2), 



438 


D. N. NANDA 


= const. [Fi.i{Z) + ft(2) + F,(2) + Fiil)\ 


= const. < 2(n + 2)< l/(n + 2)(2n + 4) — 2' 


^ 3 - 2 : 


B{2n + 4,7V -f 3)/(n + 2)| + 2(2n + 3)/(2n + 5) - 2 + l/(n + 3)| 

(2 ^ Z < 3), 

where const. = (tv + 2)(7v + 3)(2n + 5) and \ = B(2n 4- 4, n + 3), 

The exact distribution is obtained for i = 4 by the similar method. The final 
results are available with the author and are not given here due to lack of space. 

The method given in the above sections can be used to find the distribution 
of the sum of roots of a determinantal equation of any order under the condition 
m = 0. 

6, Moments of the distributions. The moments can be obtained by expanding 
the right-hand side of (6) in terms of x and then collecting the coefficients of x. 
The moments for I = 2 have been derived here and the method is illustrated 
below; 

(a) I = 2. Equation (9) gives 

f U(1 - - d 2 )e*^'’'n dff^ = const. (2 (1 - du 

I •'0 


- £ (1 - u)’'+'e‘“ d7i| = const. ^2 jT' (1 - 

-f(i- ± = const. ( 2 1: 

Jo <—0 J 1. ("0 t\ r(27i H” ^ T" 4) 


_ y ?! r« + I)r(n-1- 2) \ _ 
ht\ r(n -h < + 31 / 


2 

2 rt -h 3 


271+4 


+ (2a;)° _ {2^ _ 

f27V + 4)(271 + 5) (2a + 4)(27i + 5)(27v + 6) 


1 

(71 + 2) _ 


1 + 4- 

n + 3 ~ (n + 3) (n + 4) 


-1- 1 _L 

(n + 3) (ra + 4) (n + fi) 


/1 1 3 ^ 12(rv + 2) (4n + 11) 

\ 11 ' (n + 3) 21 (7i+3)(n +4)(27i+4)(2n+5) 

, I* 120(71 + 2)(7i + 3) (471 + 13) , 

3! (271 + 4)(2n + 5)(27i + 6)(7i + 3)(n + 4)(n + 5) 



SUM OF BOOTS 


439 


Hence 

m = 3/(n + 3), 

M 2 = 6(4n + ll)/(n + 3)(n + 4)(2n + 5) 

and 

M 3 = 30(4n + 13)/(n + 3)(ft + 4)(n + 5)(2 ji + 5). 

The moments for Z = 3 and 4 can be obtained in a similar way. 

Acknowledgements. The problem was suggested to me by Dr. P. L. Hsu. 
I take this opportunity to express my gratitude to Dr. P. L. Hsu for guiding me 
in this research. I am also indebted to Dr. Harold Hotelling for help and sug¬ 
gestions in the work. 


REFERENCES 

[1] D. N Nanda, “DiatributioE of a root of a determiBantal equation,” Annals of Math 

Stat., Vol. 19 (1948), pp 47-67. 

[2] D. N. Nanda, “Limiting distribution of a root of a determinantal equation,” Annals 

of Math. Stat., Vol. 19 (1948), pp. 340-360 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


A NOTE ON THE POWER OF A N ON-PARAMETRIC TEST 

By F. J. Marsey, Jr. 

University of Oregon 

1, Introduction. Let a;i < xj < • • ■ < Sn be tlie ordered results of n inde¬ 
pendent observations of a random variable X which has a continuous cumulative 
distribution function F{x). The following test for the hypothesis that F(x) has 
some specified form, say Faix), has been suggested by "Wolfowitz [l]. 

Form the cumulative distribution of the sample and obtain the maximum 
deviation of this from Fq{x) Tlnis if 



-S„(x) 

= 0 

when 

X < Xi, 



_ 

n 

when 

Xk ^ ^ 



= 1 

when 

Xn ^ Xy 

the test statistic used would be 





d = max 

1 Fb{x) 

- S„{x) 1 Vn, 




X 


and the hypothesis would be rejected if d is large, say larger than da which is so 
chosen that the probability of a type I error is a. The limiting distribution 
(as n —> oo) of d has been tabled [2], and a short table of the distribution of d 
for various small values of ri (n < 80) has been given [3]. 

The purpose of this note is as follows; 1. A lower bound for the power of the 
test 18 given. 2. This test is shown to be consistent against any continuous alterna¬ 
tive F{x) = Fi{x), where Fi(x) Fa{x). 3. The test is shown to be biased for 
finite n. 4. An indication of similar results for a two sample test. 

2. Lower bound for the power function. Let A = max | Fb{x) — Fi(x) \ and 

let X(, be a value of x such that A = ] Fo(xo) — Fi(xo) |. The probability that 
d> dak certainly not less than Pr{ Vnl P^ixf) - »Sl,,(xo) | > da]. This is the 
same as 

1 - Fv!^Fo{Xb) - ^< Saixo) < Fo{xb) + ^ , 

which, since Snixf) is the proportion of observations falling less or equal to xo, 
is given by the binomial probability law. 

If F{x) = Fi{x) the probability of an observation being less than Xo is Fi(xo). 
Since Fo(xo) = Fi(xo) ± A the above probability can be written as follows: 

440 



ON NON-PARAMETRIC TEST 


441 


1 — Pr(Fi( 2 :o) ± A — dn/Vn < S„{xe) < Fi{xo) ± A + da/Vn} 

= 1 — Pr{±A — daj's/n < Sn(xo) — Fi{xo) < ± A + dals/n} 

= 1 — Pr{( —da ± A-\/n)/-\/Fi{xo)(l — Fi(xo)) < iS„(xo) — Fi(xo)) Vn/ 

VX(a:o)(l — Fi{xo)) < (da ± A'\/n)/\/Fi(xij)(l — Fi(xo))]. 

A is fixed. It has been found [3] by observation for samples of size <80 that 
da actually decreases in size as n increases. For sufficiently large n both 

— da ± AVn and da ± A\/n 

have the same sign and the law of large numbers indicates that the above prob¬ 
ability approaches zero and the expression approaches unity. 

The last expression above can also be used as a lower bound of the power of 
the test for finite n. 

For large values of n this probability is given approximately by the normal 
distribution Thus we can write for large n; 

power 

where 

Xi = (—da ± A-s/n)/VFi(a:o)(l — Fi(xo)) 
and 

\2 = (da ± Av^)/VFi(a:o)(l - Pi{xo )). 

If n IS so large that Xi and Xj are of the same sign and sufficiently different 
from zero we can replace Fi{xo) by 5 and not decrease the value of the integral. 
In this case we might use as a working formula 

Xi = 2 (—da i A'x/n)^ 

X 2 = 2(da ± AVn). 

Since 



approaches one as n tends to infinity, the power, which is larger, must also ap¬ 
proach one, and thus the test is consistent. 

To demonstrate the biasedness of the test for fixed n consider the following 
picture. 

The Fo(x) is shown as a heavy line and an alternative Fi(x) as a dash-dot line. 
Fi(x) coincides with Fo(x) except between the point x = a and x = b. If Sn(x) 
falls outside of the indicated band at any point we agree to reject the hypoth¬ 
esis F(x) = Fo(x). If F(x) — Fi(x) the S„(x) has no chance of being outside 
the band between x = a and x = c, less chance between x = c and x = b than if 



442 


F. J, MASSEY, JR. 


F{x) = Fo{x), and the same chance for x larger than b. This indicates that the 
probability of rejecting F(,x) = Fo^x), if actually Fix) == Fiix), is greater than 
the probability of rejecting Fix) = Foix) if this is actually true. Thus the test 
is biased. 

3. Two sample test. Let S„ix) and sLix) be the cumulative distributions ob¬ 
served for samples of sizes n and m from two populations having continuous 
cumulative distribution functions Fix) and F'ix) respectively. Under the as¬ 
sumption that Fix) — F^ix) the limiting distribution (as n and m tend to in- 



PlQ. 1. 


finity) of d' = (n~^ + maXi | S„ix) — S'mix) ] has been found and tabled 

[4], but the distribution of this statistic for small n and m is not known. 

Suppose we wish to test the hypothesis that Fix) = F'ix) at level of sig¬ 
nificance a and agree to reject this if d' is larger than d'a , where d'a is the value 
which would be exceeded a proportion a of the time if the hypothesis is true. 
The values of da are not known for small samples but are for the limiting case [4]. 

The same argument as in Section 2 gives a limiting lower bound to the power 
of the test in terms of 


A = I Fixo) - F'ixo) 1, 

where a;o is the value of x which maximizes j i^'(x) — F'ix) (, to be 

1 




V^' 




dt, 





OPTIMTIM SELECTIONS 


443 


where 




and 


X' = (d: ^/TTi ± a) + 

\ y n m J / y n m 

Since this lower bound approaches one as n and m approach infinity the power 
also approaches one and the test is consistent. 


REFERENCES 

[1] J WoLFWiTZ, “Non-parametiic statistical inference," Proceedings of the Symposium on 

Maihemalical Statistics and ProbaMhty, University of California Press, 1949, 
pp 93-113. 

[2] N Smiknov, "Table for estimating the goodness of fit of empirical distributions,” 

Annals of Math Slat., Vol 19 (1948), pp 279-281 

[3] F Massey, “A note on the estimation of a distribution function by confidence limits,” 

Annals of Math, Stat , Vol 21 (1950), pp. 116-120. 

[4] N Smirnov, "On the estimation of the discrepancy between empirical curves of dis¬ 

tribution for two independent samples,” Bull Math Unit). Moscou, Sine Int., 
Vol 2, fasc 2 (1939) 


ON OPTIMUM SELECTIONS FROM MULTENORMAL POPULATIONS^ 

By Z. W. Birnbaum and D. G. Chapman^ 

University of Washington 

1. Introduction. Let Fi, Fa, • , F„ be scores in n admission tests such as 

those used in educational institutions, personnel selection, or testing of mate¬ 
rials, and let these scores be used as a basis for selecting a sub-population U* 
from an initial population U. This selection is usually performed in such a 
manner that an achievement or performance score X has a distribution in U*, 
which shows some required improvement over the distribution oi X in U; such 
an improvement may for example consist in changing the expectation E{X) of 
X in n to a pre-assigned value E*iX) in U*. Among all selection procedures 
based on Fi, • ■ • , F„ and achieving the required improvement of the distribu¬ 
tion of X, it appears desirable to find those which retain as large a portion of U as 
possible It will be shown that under certain assumptions the linear truncations 
studied in an earlier paper [1] are such optimal selections. 

2. Selection, tnmcation, linear truncation. Let the frequency of individuals 

with the scores (X, Fi, • , F„) be F{X, Fi, • • ■ , Fn) m II and 

1 Presented at the New York meeting of the Institute of Mathematical Statistics on 
December 27, 1949 

“ Research done under the sponsorship of the Office of Naval Research. 



444 


Z. W, BIHNBA-UM AND D. Q. CHAPMAN 


F*{X, 71 , . •. , r„) 


iu n*. Since U* wms obtained by selection from U, we have F*/F < 1, and since 
the selection was made solely on the basis of the values of Yi, • • ■ , y„, the 
ratio F*/F is independent of X. We thus have 


and 


F*(y,7i,.- - .7n) _ 
F(Z. 7i. 7n) 


,Fn) 


(2,1) 0 g^(7i. ,7„) < 1. 

UiN ^ If ■■■ f Fix, 7i. • • • , 7„) dX dYi--- dY„ and 


iY* = //•■■ / F*iX, Y,, ■■■ , Y„)dXdYi ■■■ dY„ 


be the number of individuals in TI and n*, and fiX, 7i, • ■ ■ , 7„) and f*iX, 
7i, ■ " , 7„) the distribution densities in n and n*, respectively, so that F => 

Nf, F* = N*f* and jj •. J f dX dYt ■ • ^ dY„ == JJ ■ ■ ■ J f* dX dYi ■ ■ ■ 
dYn = 1. Wo then have 


and 

( 2 . 2 ) 


NT = <pNf, 


N 


= //••• /^(Ft, , YMX, 7i, .. • , 7„) dXdY, 


dY„ 


Thus any selection of a subpopulation II* from II based only on 7i, • • • , 7„, 
defines a ^(7i, • ■ , 7«) satisfying (2.1). Conversely, if the frequencies 

F(X, Fi, • • • , F„) 

in n are given, any measurable <p{Yi , • • • , F„) satisfying (2.1) defines new 
frequencies F* = <pF and hence a selection from n based only on Fi, ■ ■ ■ , F^ . 

These considerations lead to the following definitions: 

A measurable function (piYi, , F«) which satisfies (2 1) is called a selection 

in Yi, ■' • , Yn If, in particular, <p is the characteristic function of a set fl in 
(7i, • • ■ , F„), that is ¥> = 1 in Si and ^ = 0 in S, then the selection ^ wiU be 
called a truncation in Yi, ■ ■ ,Yntothe set SI.,If SI is defined by a condition of the 
form 

SflyF, > t 

with constant ct,, t, then the truncation to the set SI will be called a linear trunca¬ 
tion mYi, , 7„. 

In view of (2,2) we will refer to 



OPTIMUM SELECTIONS 


445 


(2.3) r{.<p) =//•••/^(^1, ••• ,7„)dXd7a, d:7„ 

as the paction retained in the selection tp. 


3. A lenuna. We will need the following slight generalization of the funda¬ 
mental lemma* of Neyman-Pearson (cf. [2]). 

Lemma Let G{Y ^, • • ■ , 7„), Gi(7i, ■ ■ ■ , 7„), • - ■ , G^iY ^. - • - , 7„) begiven 
integrable funchons and Ci, ■ • • , Cm given constants, and let {<!>) he the family of 
all measurable functions (p(Y■^, ■ • • , 7„) which satisfy the conditions 

(3 1) 0 < <p(Y, , • • , 7„) < 1 

(3.2) f ■■•[ s>(Yi,---,Y„)G,(Yt,---,Y„)dYi---dY„ = c, 

*^00 *'-00 

for i = I, ■■■ , m. 

If there exist constants hi, ■ ■ • , km such that the characteristic function 
<p,{Yi, • ’ ■ , Y„) of the set E 

(Vi, . ,K„) L •”! 

(3.3) r r <poOdYi--- dY„> r ••• r cpOdYi- -dY^ 

ao a'—OQ aa—CO *'—CO 




E belongs to ((t>), then 


for any (p in 

Proof: We have <oo = 1 > ^ in and <pq = 0 < <pmE, hence 





vdYi--‘dY„, 


and (3.3) follows since ipo and <p fulfill (3.2). 


4. Selection from a multivariate normal population, for which the fraction 
retained is maximum. From now on we assume that the conditional distribution 
of X for given 7i, 72, ■ • ■ , 7„ is normal with a mean which is a linear function 
of the 7’s and with a variance which is independent of them, i.e., 

(4.1) fix I 7:, 72, • • •, 7„) = exp 

Let QiYi, • • , 7„) denote the marginal density of 7i, ■ • , 7„ . 

Theorem 1. A selection such that 

1° in n* a proportion at most equal to a given proper fraction e has values of X 
below Xa ,i e. the e-quantile in 11* is greater than or equal to Xo, when Xo is a 
given number greater than the t-quaniile in II, 

2° the fraction retained is maximum, 
is a linear truncation. 


2<r^ 



446 


Z. \V. BIRNBAUM AND D. Q. CHAPMAN 


Proof: We have to maximize 


(4.2) 

rW-/-- 


.F„)Q(Fi,--. 

, F„) dFi •. . 

dY„ 

under the 

condition 





00 “^00 

|.+W 

*^“00 

••.F„)Q(Fi, 

■••.FJ/(X|F,, 

■■•,F„)dFi 

■ • •dYjX 

«-t-W -+to 

*^eo *^60 

/»+« 

• ■ ^(Fi. • 

•^“SO 

■•,F„)Q(Fi,. 

■■,Y,)f(X\Y„ 

■■■,Y„)dYi' 

■ • dY„dX 


Substituting the expression (4.1) for f(X | Fi , • ■ • , FJ and integrating with re¬ 
spect to X we may rewrite this in the form 


LM = I 


+n -+« 


(4.3) 


where 


/ v(F,,.--,F„)Q(Fi,-..,FJ 


dFi . ■ ■ dr, ^ 0, 


and we have to maximize (4 2) under condition (4.3). 

Without loss of generality the inequality L(<p) < 0 in (4.3) may be replaced 
by equality. For if we had a selection which maximizes (4.2) and satisfies (4.3) 
with a strict inequality L{ifn) < 0, then <pi could not be equal to 1 almost every¬ 
where since then we would have F* = F almost everywhere and Xo would be 
equal to the e-quantile in II, in contradiction with 1°; hence + a(l — pi) 

for sufficiently small a > 0 would also satisfy (4.3) with a strict inequality but 
would yield r( 9 j) > r(<f>i). 

To solve our problem we now have to maximize (4.2) under the condition 
(4 4) L(^)=0. 

Applying the lemma of Section 3, with m = 1, and 


G{¥i, ■■■,¥„) = Q(Fx, , F„). 

/xo - E P. F.^ 


ft(Fi,... ,F„) = Q(F:, ,F„) 


we conclude that the selection satisfying 1" and 2° will be the characteristic 
function vo(Fi, ■ • • , F„) of the set defined by 


(4.5) 




< 1, 


provided k can be determined so that <f>o satisfies (4.4). 



DISTRIBTJTION OF DISTANCE 


447 


To find such a we consider 


m-}_ 

J Q(Yi, • 

• • , Fn) 

(xo-tp. F.\ 

.S 



L \ (T / J 


As i tends to — <», I{t) tends to L(l), where L was defined by (4.3). Since the 
€-quantiIe in II was less than Xo it follows that I(—oo) = L(l) > 0. Since 
7(0 < 0 for large t, there exists U such that I{k) = 0, and clearly, 

Setting in (4.5) k = [V'((Xo — ta)/c) — e]~\ one obtains a (pt> such that 

7/(<oo) = 7(io) = 0. 

The selection ipo is the linear truncation to the set PtT, > k . 

By a similar and somewhat simpler argument one proves the following the¬ 
orem. 

Theorem 2 A selection such that 

1° in n* the mean of X has a value greater than or equal to a pre-assigned num¬ 
ber m > 0, 

2° the fraction retamed is maximum, 
is a linear truncation to a set S"-i P>X, > k. 

An immediate consequence of Theorems 1 and 2 is that a linear truncation, 
using a properly determined weighted score p,F, and cutting score U , is 
more economical than any truncation to a set F, > t,, i = 1,2, • • • ,n, that is 
than any truncation performed on each admission score separately. 

REFERENCES 

[1] Z W. Bihnbaum, “Effect of linear tiuncation on a multinormal population,” Annals of 

Math Stat , Vol. 21 (1950), pp. 27^-279. 

[2] J Nbyman and E S Pearson, “Contributions to the theory of testing statistical 

hypotheses,” jS(a<. Ees Memoirs, Yol I (1936), pp. 1-37, particularly pp 10-11, 


THE DISTRIBUTION OF DISTANCE IN A HYPERSPHERE 
By J. M Hammerslbt 
University of Oxford 

1. S umm ary. Deltheil ([1], pp. 114-120) has considered the distribution of 
distance in an n-dimensional hypersphere. In this paper I put his results (17) 
in a more compact form (16); and I investigate in greater detail the asymptotic 
form of the distribution for large n, for which the rather surprising result emerges 
that this distance is almost always nearly equal to the distance between the 



448 


J. M. HAMMEIiSLEV 


extremities of two orthogonal radii. I came to study this distribution by the 
need to compute a douldy-threefold integral, which measures the damage caused 
to plants by the presence of radioactive tracers m their fertilizers; for the dis¬ 
tribution affords a method of evaluating numerically certain multiple integrals, 
I hope to describe elsewhere this application of the theory. 


2. Derivation of the frequency function. Let Ti and ITj be vector spaces of n 
and 2n dimensions respectively. Let P and Q be any pair of points in Ti . Denote 
by {PQ) the point in Ti, whose first n coordinates are the coordinates of P 
in Ti and whose last n coordinates aic the coordinates of Qm Ti . Let (P) and 
(Q) be point sets in Ti , and lot [PQ\ be the point set in Tj such that (PQ) e (PQ) 
if and only if both P e (P) and Q e (Q}. Let Mi\P] denote the n-dimensional 
measure of the point set |P) in Ti , and let M21PQ1 denote the 2n-dimensional 
measure of the point sot [PQ] in Ti. Then 


(1) Mi[PQ} = JlififQl dMilP), 

Jip| 

Let P be a fixed point m Ti , and let Sn(a) bo the n-dimensional hypersphere 
in Ti with centre R and radius o. Let A and B be any two points chosen at 
random in iS„(a), the distributions of A and B being independent and uniform 
over the interior of S„(a). Denote the distance AB by r, and let X = r/2o, 
so that X may take any value in the interval 0 < X < 1. We require the fre¬ 
quency function of X, which we shall denote by /n(X). 

The volume content of Sn{a.) is 

(2) 7„(a) = ir"V/r(in-|-1); 

and the content of the segment of the surface of Sn{a) bounded by a right hyper- 
spherical cone, whose vertex is at R and whose line generators make a fixed 
semi-vertical angle 6 with a fixed radius of Sn(ci), is 

Q (n-l)/2 n-l .9 

( 3 ) u„(a, e) = 

r — z) ■'0 

As a particular case of (2), the whole surface of Sn{<T) has content 

(4) U„ia, r) = 2x"'“o’-Vr(^n). 

Let [AB] be the point set in Ti such that {AB) e [AB] if and only if the cor¬ 
responding points A and B satisfy all the inequalities 

(5) 0 < PA < a, 0 < PB < o, r < AB < r -|- dr. 


Then, by the definition of /n(X), 

M 2 {AB) o:/„(r/2a) dr; 

but since 


p2a A2a 

/ Mi[AB] dr = V\,, / Sn{r/2a) dr/2a = 1, 

Jo Jo 



DISTRIBUTION OP DISTANCE 


449 


we have 

(6) M^[AB] = ViU{r/2a) dr/2a ^ p„(r, a) dr, say. 

Consider also the point set {CD} in Ta such that {CD) e {CD} if and only if 
the corresponding points C and D satisfy all the inequalities 

(7) 0 < RC < a + da, a < RD < a + da, r < CD < r + dr. 

For each fixed D of {D}, C is constrained to lie on the segment of the hyper- 
spherical shell of thickness dr, radius r, and centre D, bounded by the inter¬ 
section of this shell with Sn{a fi- da) The hypersphencal cone, with vertex D, 
whose line generators all pass through this intersection, has a semi-vertical 
angle d given by 

(8) cos 6 = r/2a = X, 

and so, from (3), the Mi of all C which satisfy (7) for each fixed D is Un{r, arc- 
cos X) dr On the other hand the Mi of all D which satisfy (7) is the content of 
the hypersphencal shell of thickness da, radius a, and centre R, and is thus 
f/„(a, tt) da by virtue of (4) Consequently, from (1) 

(9) Mi { CD ) = Un{r, arccos X) Un{a, t) da dr. 

On the other hand, by symmetry, M 2 (CD) = ^M 2 {EF}, where {EF) i {EF} 
if and only if the corresponding points E and F satisfy either all the inequalities 

0 < RE < a + da, a < RF < a + da, r < EF < r + dr, 

or all the inequalities 

0 < RF < a + da, a < RE < a + da, r < EF < r + dr. 

We can express this in another way by saying that {EF) t {EF} if and only if 

the corresponding points E and F satisfy all the inequalities 

0 < RE < a + da, 0 < RF < a + da, r < EF < r + dr. 


but do not satisfy all the inequalities 

0 < RE < a, 0 <RF <a, r < EF < r + dr. 
From this second pomt of view we see that 

d 

Mi{EF} = p„{r, a -f- da) dr - p„(r, a) dr = — p„(r, a) dr da-, 

and so 

(10) M 2 {CD} = a) dr da. 


Then from (2), (3), (4), (6), (9), and (10). 


( 11 ) 


1 d f 

2 3a \[r(^n + l)p 



T{^ - I) 


i 


ElICCOlX 


sin 


n-a 



. r(^) j 



450 


J. M, HAMMERSLET 


By perfoiming the partial differentiation on the left-hand side, then substituting 
z = cos 0 and r = 2aX, and using the relations 


r{}n + 1) = InVi^n), z''~Vin + 1) = 2'‘r(^n + i)r(^n + 1), 

B(|n + i ^71 + ^) = (r^n + ^)lVr(n + 1), 


we reduce (11) to the form 

(12) (2„ - 1) fM - xfM - [ (1 - .ill 

We multiply (12) by —X'*" and use the reduction formula 

(13) (n - 1) £ (1 - dz = n £ (1 - dz + X(1 - xV""”''. 


Each side of the resulting equation is a perfect differential coefficient, and upon 
integration we obtain 

2nX’‘“' 


(14) 


/.(X) = 




(n-l)/a 


dz + c\ 


2n-l 


B('|a + i, + f) h 

where C is the constant of integration. We obtain the cumulative distribution 
function by integrating (14) over 0 to X, 

(15) E„(X) = (2X)"7i_xj (^a + §) + + it) + CX*V2n, 

where hip, q) is the incomplete beta-function ratio 

Ixip, q) = I 2’’''‘(1 - 2)'"' dz/B{p, q) 


tabulated by Pearson [2], Putting X = 1 in (15) we got 

1 = P„(l) = 1 -h C/2n; 
so C = 0, and we have the final result 
(16) /„(X) = 2’'nX"-Xx2(^n + h^). 


This compact form may be compared with DeltheiPs expression [1] for the fre^ 
quency function of r, namely 



where 


pix—l 

hni2 sin e) = / 

^0 


sm" ’*0 


/r 


sin" d<l>, 


expressions which he evaluates only for the particular cases n = 3, 5, 7, 9, 
Interesting particular cases of (16) are 



DISTRIBUTION OR DISTANCB 


451 


/i(X) - 2(1 — X), fiiX) = — X{arccos X — X(1 - X^)‘'“}, 

(18) IT ^ 

MX) = 12X^(1 - X)'(2 + X), 


which give the appropriate frequency functions for a line, a circle, and a sphere 
respectively. 

3. Recurrence relations and moments of the distribution. From (13) and (14) 
we have a recurrence relation for penadjacent values of n, 


(19) 


/n(X) 


= 4X 


2fn-i(X) 


2r(n) 


n — 2 {r(^n + i)j 


;X'‘(1 - X“)'"-'’'=. 


In connection with (18) this shows that 

(20) / 2 n+i(^) ~ F’4n+i(X), M(X) = Pin-i(X) arccos X Pin—2(X)(1 — X^)^^^, 


where Pw(X) denotes an unspecified polynomial in X of degree N or less. 
From (16) the rth moment of /„(X) about X = 0 is 


( 21 ) 




7i,r(n + 1) \ j r(ira + jr + i) \ 
.r(^ + i)/\(n + r) r(n + hr + 1)J 


I have not been able to obtain the characteristic function of /n(X) explicitly 
from (21) it appears to be of a higher type than the hypergeometric function. 

4. The asymptotic form of the distribution for large n. The distribution func¬ 
tion is, by (15), 


(22) F„(X) = (2X)"/i_xii(Rn + ?, 5 ) -1- h>{hn -|- 2 , In -t- i). 

We show firstly that as n —> =0 the first term of this expression tends to zero. 
This term is clearly zero if X = 0. If X > 0 

g, < x-^ dz= {1- X=')'"+'>'Vl(w + 1) X. 


Hence 


•(2X)"7i_x.(|n + I, I) < 


(2X)"r(|n + 1) ^ (1 - 
7r'^“r(|n -t- I) . (|n + I) X 


^ 2r(|n -f- 1 ) 

- 7ri/*r(|n-f I) 


(1 - X*){4X“(1 - < 


(ti-i )/2 ^ 2r(|n + 1) 


^i/2r(|n -1- I) 


as n ^ 00 . Secondly, as n —> « 

7x>(|n -h I, |n -t- I) ~ •^x»(l> l/4:(n -f 1)) ~ JVxj(|, l/4n), 

(see Cramdr [3] p. 252 with p = g = |), where <t) is the normal cumula¬ 
tive distribution function of x for mean fi and variance Hence X is asymptoti¬ 
cally distributed as N\{l/'\/2, l/8w); and the asymptotic distribution of r is 
iVr(oV2, a^/2n). This establishes the result stated in the summary. 



462 


B. K. ZBIGLER 


It can also be proved, by considering the limiting form of the recurrence rela¬ 
tion (19), that the frequency function /„ is asymptotically normal. The main 
difficulty of proving this fact lies in showing that the frequency function actually 
possesses a limiting form; and the proof is rather too long to be given here, 

REFERENCES 

[1] R. Deltheil, ProhabililSs Giomitriques. Traitt du Calcul dcs Probabilitis et de see 

Apphcalions: Tome II, Fasciscule II, Gauthier-Villara, 1926. 

[2] K. Pearson, Tables of the Incomplete Beta-Function, Cambridge University Press, 1934 . 

[3] H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 1946, 


A NOTE ON THE ASYMPTOTIC SIMULTANEOUS DISTRIBUTION OF 
THE SAMPLE MEDIAN AND THE MEAN DEVIATION FROM 
THE SAMPLE MEDIAN 

Bt R. K. Zeigler 

Bradley University 

Consider a random sample of 2/c + 1 values from a one-dimensional distribu¬ 
tion of the continuous typo with cumulative distribution function (cdf) F{x) 
and probability density function (pdf) f{x) = F'(x). Let the mean, standard 
deviation and median of the distribution be denoted by m, a and Q respectively 
(9 assumed to be unique). We shall suppose that in some neighborhood of 
X = B, fix) has a continuous derivative /'(*)• 

If we arrange the sample values in ascending order of magnitude: 

xi < Ka < • • • < Xik+u 

there is a unique sample median which we shall denote by The mean 
deviation from the sample median is then defined by 

1 Sk+i 

In the material that follows we shall assume that the sample items have been 
ordered only to the extent that h of them are less than ^ and /c of them are greater 
than f. 

We then have the following 

Theorem. Lei fix) he a pdf with finite second moment, continuous at x = 6 with 
fid) 9 ^ 0. Then the simultaneous distribution of £ and M is asymptotically normal. 
The means of the limiting distribution are $, the population median, and u', the 
mean deviation from the population median, while the asymptotic variances are 
l/^fi6)2k and ((m — dfi + or* — u'^)/2k. The asym ptotic expression for the 
correlation coefficient is (m — B)/^/im — 6)^ -f or* — u'^. 

Proof; Let u = iM ~ u')\^ and a = (£ — 6 ) a /^, where u ' = E \ x — 6\. 
Then the simultaneous characteristic function of the two random variables u 



AN ASYMPTOTIC DIBTBIBtJTION 


453 


and V is given by the following; 

_ S exp [«. (4 S I£ I - •') + '«* - 

d-ri-c 


(2k + 1)! 

(kir 




' 2H-1 * 

exp ^ 

iti] 

X Xi — ^X, 
i-1 

[ 2/c 


— u'j d" ~ ^)\/^J 


dxjt+i ■ • ■ daiji+sdxt • • • dxid£ 


(2k +1)! r f 


til 


(a: + w')| /(a) 

Dp„p P.akipg the .uhetihltion f - . + ,/V5, the above expreselon cap be 

reduced to the following form: 

_ ( 2 k + 1)1 r fr [' \^j^(x + uo 1 m dx 

(1) ^(ti, fs) v'2/t(fc0'^ IL-L- L -1 


f 


w. 

r iii 


•H-to/V*® 


exp 




exp 




(a: 


- u') J(x) 


dx 


- Je 




Now 


• «'"V (" + *■ 

iNOW a 

_ X f' (. + »•)■/(*) * + 

2(2fc) JL« 



464 


R. K. ZEIGIjER 


and 

/; 


exp 


iii 

LV2it 


ill r 

{x - u') fix) dx = \ ~ '^'Vi'x) dx 

-mi + 


f 2 ( 2 /c, k) 


2k 


where for every fixed k , k) and ^ 2 ( 2 ^:, ii) —> 0 as fc —>■ «>. Similarly, under 
the substitution x = izf\/2k) + 6, 


I 


e+(.vl^/ik) 


exp 


iti 


ix - R')] fix) dx - f + e) 


+ 11 (vat + ^ ^ (v^ + ^) d. + 


U2k, k) 


2k 


and 
pHCiz/Vw 


^fl+(i//v 2 A:) ~ " 1 aV { Z \ 

/, L^ i +") 


dz 


~'&l + ^ + u') / + 0) de + 

where ^ 3 ( 2 ^;, k) and Ui'^K k) —* 0 ask—* 00 for each fixed k ■ Substituting these 
^expressions in ( 1 ) and performing the indicated multiplications we find after 
some calculation that ( 1 ) can be reduced to the following form; 


4>ik, ~ J_ 


( 2 /c + 1)1 
V^(^:l)' 2 “* 

-4 lyf 


tli<r^ — u'‘) — 4ifi(m — 0)yf 


Zl 


1 - 


(v^ 


+.) 


2k 


+ 


W2k ^ -J ,“^f(s^:^)dy, 


2k 


where 0 < zi < y and {■(2fc, h) — 1 0 for every fixed <1 as A; — >• w, Now taking the 
limit as fc —^ 00 , we have 


lim tfiik , ti) = f exp — ^ (c® — n'^', 

*-»« J-oo yir L 2 


‘) 




iikim — 9)fi9)y 4[/“(0)]j/* 

2 2 


Upon performing the integration, 

-WtU(m - e)^ +<r^-u'^] 


lim 4>iti, fa) = exp 


, 2<ifa(m — tf) I ti 
' r rt^//i\ r 


2 /( 0 ) 


tl \ ~ 

4 /*( 0 )/J' 



ON Craig’s theorem 


455 


Since O' > u , this is the characteristic function for two variables which are 
normally distributed Thus, the simultaneous distribution of ^ and is asymp¬ 
totically normal. It is of interest to note that, if the pdf fi'c) is symmetric, the 
correlation coefficient is zero, and M and f are asymptotically independent. We 
might also note that (^(ii, 0) is the characteristic function for the mean deviation 
from the sample median. Thus, the random variable M is asymptotically normal 
with asymptotic mean and variance u' and ((m - df + - u'^)/2k respec¬ 

tively. 

The author wishes to express hia appreciation to Professor A. T. Craig for 
valuable suggestions in the study of this problem. 

REFERENCES 

[1] H CuAMl&ii, MaihctnaiiccLl Methods of SiaiisiicSj Princeton University Press, 1946 

[2] R. K Zeigler, “On the mean deviation from the median,” unpublished thesis, State 

University of Iowa 


NOTE ON THE EXTENSION OF CRAIG’S THEOREM TO NON-CENTRAL 

VARIATES 

By Osmbb Carpenter 

Carbide and Carbon Chemical Corporation, Oak Ridge 

A theorem due to A. T. Craig [1] and H Hotelling [3] concerning the distribu¬ 
tion of real quadratic forms in normal vanates is extended to the case of non¬ 
central normal variates with equal variance. 

The following notation is used: A, Ai, Ai are real symmetric matrices, L is an 
orthogonal matrix, F is a diagonal matrix of latent roots, and X, Y, M and U 
are column vectors. 

Theorem. Let X' = (xi, • • • , Xn) be a set of normally and independently dis¬ 
tributed variates with equal variance v and means M' = (wti, • • , m„) . Then, 
a necessary and sufficient condition that a real symmetric quadratic form 
Q(X) = X'AX of rank r be distributed as a^x> where 

p(x’.r.X’) - 

( 1 ) ” 

r (X^xV2)Viir[(r - 2j)/2], 

7-0 

is that A“ = A. If Q{X)/a is distributed by p{x, r, X“), then = Q(Af)/2ff“. 

Further, letQiiX) = X'AiX andQiiX) = X'AiX be real symmetric quadratic 
forms of ranks rx and rj. Then a necessary and sufficient condition that Qi(X) 
and QxiX) be statistically independent is that A 1 A 2 = 0. 

Proof. The theorem is proved by establishing the equivalence and factoriza¬ 
tion of moment generating functions [4]. The moment generating function of 



456 


OSMER CA.RPEOTBE 


p(x^ r, is 

( 2 ) 0 ( 1 ) = - 0 "^'' 

Let Xi, • ■ ■ , x„ be normally and independently distributed with means 
E{Xi) = vh and common variance Without loss of generality, we may take 
cr“ = 1, changing to the general case when necessary with the transformation 

Xi — 2,/o'. 

Let QiX) = X'AX be a real symmetric quadratic form of rank r. Then the 
moment generating function of Q(X) is 

(3) G«(t) = = (27r)""'= f" • ■ • H dx.. 

•>-« *'-« 1 


If t IS restricted to values such that 111 < | l/yo |, where yo is the dominant 
latent root of A, then / — is positive definite and 

God) = 


(4) 


J'M >>-« 1 


_ I J 


{A 




If L is an orthogonal matrix such that 


L'AL 


r = 



> 


where the y, are the latent roots of A, then the transformation M = LU gives 

(5) Gq( 0 = e)'''<ra-ir)-ii7 | j _ 

A necessary and sufficient condition that Go(i) = G{t) is that A^ = A. If 
A** = A, then all of the latent roots of A are +1 or 0, and sufficiency can be 
established by substituting the appropnate value of each 7 , into equation (5), 
giving 

( 6 ) Ge(0 - = G{l). 

Also = 2 r 7 .w 5/2 = \{,WTV) = KM'AM) = Q(M)/2. 

It is apparent from the form of Oq{t) that a necessary condition for 
Gq(0 = G(i) is that | 7 - iA |~* = (1 — But it has been proved by Craig [1] 
that the condition A“ = A is necessary, as well as sufficient, for this equality. 

Next, let Qi(X) = X'AiX and Q 2 (X) = X'AsX be real symmetric quadratic 
forms of ranks n and ri . Then from (4) 

G(ti,«,) = 


( 7 ) 




ON chajg’s theorem 


457 


h , t 2 being restricted to values for which {I — hAi — tiA^) is positive definite. 
A necessary and sufficient condition that G{k , t^) = Goik) G^ih) is AjAj = 0 
The required equation in the moment generating functions is 

Gih,U) = I J - 

( 8 ) 1111 

^ ^ I j _ pj 

Assume AiA^ = 0. Then (7 - t^A^ - kA^) = (7 - ai)(/ - ^ 2 ) 

and I 7 — kAi — tiA^ | = | 7 — iiAi | • | 7 — UA 2 1. Also 

{iiAi + i2A2){I - tiAi - t2A2)~^ = tiAiil - tiA{)~^ + i2A2(7 - i2A2)"\ 

for using the identity tA{I - tA)~^ = (7 - iA)“^ - 7, this becomes 

(7 - <2A2)“'(7 - qAi)-' = (7 - + (/ - ijAs)"^ - I 

Multiplying both sides on the left by (7 - < 2 A 2 ) and on the right by (7 - kAi), 

the identity follows. Thus the condition is sufficient. 

It IS apparent from the form of the moment generating functions that a 
necessary condition for 0{ii, 4) = G(i(<i)Go(4) is that | 7 - iiAi - t 2 A 2 \ = 

17 — iiAi I I 7 - t 2 A 2 \. However, it has been proved by Hotelling [3] and 
Craig [2] that the condition AiAj = 0 is necessary for this equality. 

An extension can be made to correlated variates. Let X' = {xi, • • , Xn) 
be normally distributed with non-singular correlation matrix B and means 
il7' = (mi, ■ ■ ■ , vin). Then there exists a non-singular transformation X = TZ, 
such that the variates Z are independent and have unit variance. Thus 
= I, B = TT' and Q(X) = X'AX = Z'T'ATZ. Applying the theorem 
proved above, a necessary and sufficient condition that Q(X) be distributed as 
x' is that (rAT)' = T'ABAT = T'AT, or that ABA = A. As before, 
X** = Q{M)/2. In the same manner, a necessary and sufficient condition for 
independence of Qi{X) and Q 2 {X) is that {T'A:T){T'A 2 T) = T'AiBA 2 T = 0, 
or that A 1 BA 2 = 0. 

REFERENCES 

[1] Aluen T Cbaig, “A note on the independence of certain quadratic forms,” Annals of 

Maih Slat, Vol 14 (1943), page 195 

[2] Allen T. Oeaig, "Bilinear forms in normally correlated variables,” Annals of Math. 

Sial , Vol 18 (1947), page 565 

[3] H. Hotelling, "A note on a matric theorem of A T Craig,” Annals of Math. Slat, 

Vol. 15 (1944), page 427. 

[4] S. 8 Wilks, Mathematical Statistics, Princeton University Press, 1943 



458 


HERMANN VON SCHELLING 


A SECOND FORMULA FOR THE PARTIAL SUM OF HYPERGEOMETRIC 
SERIES HAVING UNITY AS THE FOURTH ARGUMENT 


By Hermann von Schelling 
Naval Medical Research Laboratory, Neio London, Connecticut^ 


A convergent hypcrgcometric series Avith 1 as fourLli argument has been 
expressed by Gauss, using gamma fune.lions, as follows: 


( 1 ) 


F(«, /3, 7 ; 1) = 1 + 


y. 1 


«(a + 1) gCd + 1) 

7 ( 7 + D'l 2 




r(7)r(7 -a~0) 
r(7-«)r(7-(3)' 


Let us denote the vth partial sum of F{a, /3, 7 ; 1) by F,{a, /?, 7 ,1), and let us put 


( 2 ) 


F>(a, d, 7 ; 1) 

Fia,§, 7 ; 1 ) 


Gi,(a, i3, 7 ). 


The folloAving equation is obvious* 

(3) Gv(a, di 7 ) = a, 7 ). 

In [1] it is shown that 

(4) G/a, /9, 7 ) = 1 -Ga(v, y — p — a, y — oi+v) 


is valid if a is a positive integer. 

If (7 - ^ - a) is a positive integer, (3) and (4) yield 

G^(a, |8, 7 ) = 1 - G«(7 - P - a, v,y ~ a + v) 

= Gy-p-a{a, P, a P v) 


In terms of partial sums of the hypergeometric series this becomes 


(6) 


r(7 - a)r(7 - p) 

r( 7 )r (7 - p -a) 


F,{a, P, y] 1) 


r(a + v)np + v) 
r(v)r(« + P + V) 


p,a + P + v; 1), 


which is a new formula involving partial sums of hypcrgeometric senes with 1 
as fourth argument. It is more useful than ( 4 ) if 7 — — a < a or 7 < 2 a + /3. 

It is of theoretic interest that the arguments of the new senes do not depend 
on the third argument 7 of the original series, Therefore it is possible to develop 
a simple recursion formula. If we write (5) for (7 — 1) instead of 7 , the series 
of the second member has one term less. Subtracting these equations yields 
after some simplifications 


* Opinions or conclusions contained in this paper arc those of the author. They arc 
not to be construed as necessarily reflecting the views or endoisement of the Navy De¬ 
partment 



HYPBRGEOMETBIC SERIES 


459 


(7 - « — 1) (t — ^ — l).P'.(a, /J, 7 ; 1) 


(g) - (7 - /3 - a - 1)(7 - l)F,{a, ff, 7 - 1; 1) 

^ r(i> + a) _ + /3) r( 7 ) r(i) 

r(a) ■ T(/3) ‘ T(v-i-y - 1) ' f^- 

Many recursion formulas are knovm for hypergeometric functions, but (6) may 
be the first equation of this type linking two hypergeometric partial sums of v 
terms each. 

In order to demonstrate the numerical advantage of the new formula (5), 
we restate the example of [1]. An urn may eontain N balls of which a black and 
b white A single ball is drawn. We note its color, return the ball into the urn 
and add A balls of the same color. The probability that the With black ball 
appears at the latest in the n-th drawing is 


(7) 


W(w) = 



/ b N , 



If - is a positive integer (5) yields 
A 


W(n) = 


(n — w + 1) (n — ni + 2) 


( 8 ) 


If we take 


+ n - Wi + 1^ 0 + w - m + 2^ • • • Q + 

• Falb. ^ ^ ^ "I" ■ 

A = 1, (1 = 1, 6 = N — 1, 


we get 
(9) 


W(w) = 


n!(N + n — Wi — 1)1 
(n — rii) !(N -{• n — 1)! ‘ 


Calculating W(n), using the original formula (7), is quite tedious, but (5) 
sometimes simplifies the numerical work. Let us calculate the probability W(6) 
that the third black ball appears in the 6th drawing, if the number of the original 
balls IS N = 10. Using formulas (7), (4), and (9) respectively we have 


W(6) = 


319! r 3-9 (3-4)(9-10) 

121 L 13'1 ^ (13 14)(l-2) 


(3-4-5)(9-10-ll) 
(13-14-15)(l-2 3) 


W(6) = 1 


121 9! r, 4-1 (4-5)(l-2) 1 _ 4 _ 

SllSlL 14 1 (14-15)(1-2)J 91’ 


W(6) = 


6112! 

31151 


91' 


£ 

91’ 



460 


HERMANN VON SCriELLINQ 


The time saved in using both formulas, of course, increases as the number of 
terms, n - ni - 1, of the original series, increases, 

Let us mention that the .special distribution corresponding to (9) does not 
have finite moments, For arbitrary values of N, a, A the arithmetic mean is 

( 10 ) 

ffl — A 


the expectation of n(n + 1) is 


and finally the variance 


( 12 ) 



(N - A)(N - fl)[(ni - 1)A + o] 
(a - A)2(a - 2A) 


• Til. 


The mode can be derived from the fact that 


(13) 


w(n + 1) = w(n) for n 


N 

a 4- A 


• (ni - 1). 


Especially we get w(ll) = U)(10) for our numerical example. 

The mean and variance do not exist for a — A = 1, as in our example. How¬ 
ever, it is possible to find a number n so that W(n) takes any value near to 
unity, for instance .99. For large n and small ni (9) yields the approximation 


Win) = 


n(n - 1) • ■ • (u, — ni -j-1) 


(N + n — 1) (N -{- n — 2) > • • (N + n -- rii) 


n — 


ni~l 


, "1 


iN -j- n - 


ni+1 


Hence, W(2666) = .99 for our example. One needs 2666 trials if one wants a 
99% probability for getting three black balls. This surprising result cannot be 
derived from the original formula (7). 


REPEEENCE 

[1] H VON ScHELLiNO, “A formula for the partial euma of some hypergeometrio series", 
Annals of Math. Slat., VoJ. 20 (1049), pp. 120-122. 



ABSTRACTS 


461 


ABSTRACTS OF PAPERS 

(Abstracis oj papers presented at the Chicago meeting of the Institute, April S8~S9, 1950) 

1. The Distribution of the Quotient of Ranges in Samples from a Rectangular 
Population. Paul R. Rider, Washington University, St. Louis, Missouri. 

The distribution of the quotient of the ranges of two independent, random samples from 
a continuous rectangular population is derived The distribution is independent of the 
population range and can be used to test the hypothesis that two samples came from the 
same rectangular population just as the distribution of the variance ratio is used to test 
whether two samples came from the same normal population. 

2. A Geometric Method for Finding the Distribution of Standard Deviations 
when the Sampled Population Is Arbitrary. (Preliminary Report). Paul 
Irick, Purdue University. 

For an ordered random sample, ii g aij ^ ^ , chosen from a population,/(x), 

o g x g b, let r, = Xi+i — x, § 0, t = 1,2, • • , n — 1. Make the transformation 



and call U' the 1/n I por tion o f the r' space bounded by the n - 1 sphere and hyperplanes, 

/2 # /^ MM ^ f ^ 

S r, = 2ns*, r, = i /-r,_i , « = 1, 2, ••• , n - 1, where s is the sample standard 

> K « + 1 


deviation The point density in U', Hr'), is the transform of 

-fr—Sft 

== / y(a:i)/(a:i + n) •• • /(a^i + n + -■ -|- rn-i) dzi . 

Change to generalized polar coordinates and call U the outer hypersphencal boundary 
of V whereon the density is designated by 8(V^s, p). Then p(s), the probability law for 
s, is given by 

p(s) ds = nln"/*s"~’ds / • •/ j(V^s, v>) sin""'sin v'n-i ■ -difi, 

•'fi J<Pn-t 


(n — t)(i + l) 


g ■Pi g arc cos 


tan(p.-ij,t= 1,2, ••• , 71 - 2, 


whenever b is infinite. The distribution of sample range is readily found in [/' and is 
expressible in the same form as p(s) with the same limits of integration When 6 is finite, 
the complete integral holds only for 0 g s g (b — o)/\/^i there being 7174 connected arcs 
in p(s) if n is even, and (n* — l)/4 arcs if n is odd. The axes are rotated to give relatively 
simple formulas for p(s) when n g 4, the case of n = 6 also being discussed. The method 
readily produces previously reported results for p(s). In the application of the method, 
particular attention has been paid to the Type III and polynomial Type I populations The 
density function provides much information concerning the form of p(s) for various popula' 
tions, and contours of constant S in U' are of theoretical interest. 



462 


NEAVF) AND NOTICES 


3, Probability of a Correct Result with a Certain Rounding-off Procedure. 
W. S. Loud, University of Minnesota 

Consider the problem of the addition of n numbers expressed in the base B of numeration. 
Supposing each number known to arbitrary accuracy, to obtain the sum accurate to k places, 
one may round off each numlior to (fc + 1) places, add, and round the sum to k places If 
the numbers are assumed uniformly distributed, the probaliility that the above procedure 
gives the correct result may be found explicitly by use of characteristic functions. If the 

base B Is odd, the result is 2(jr5)“‘ j Bin"~'n8in*Ba du, and if the base B is even, 

2 (irJS)~' I sin® Bit cos u du Both formulas have the asymptotic formula 6'/*S(irn)~h2 
Jo 

as n becomes infinite. 

4. Analysis of a One-person Game. (Preliminary Report). W. M. Kincaid, 
Univeraity of Michigan. 

The pioblom of allocation of supplies is one which arises in many military and economic 
connections. The present report discusses a game constructed as a model of a simple situa¬ 
tion of this type. The player is given a supply of cards, and receives payments for giving 
these up when certain random events occur during the period of play. 

The optimal strategy, winch maximizes the expected value of these payments, is gov¬ 
erned by certain critical times such that the player’s response to a particular event depends 
on whether it occurs beforo or after one of these times. 


NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Leo A. Aroian, on leave from Hunter College, is acting as a Research 
Physicist in charge of computations at the Hughes Aircraft Co., Department 
of Electronics and Guided Missiles, Culver City, California. 

Dr Ralph A. Bradley from McGill University, Montreal, Canada will join 
the staff as A.ssociate Profe.ssor in the Department of Statistics at Virginia 
Polytechnic Institute on July 1, 1950. He will devote the majority of his time 
to research on rank order statistics. 

Dr E. R. Dalziel has relinquished his post as Assistant Master at Technical 
School, Neiv Zealand, to become Senior Engineer with the Overseas Telecommu¬ 
nication Commission, Australia. 

On September 1, Dr. David Duncan from the University of Sydney, Sydney, 
Australia, will join the statistical staff of Virginia Polytechnic Institute as Asso¬ 
ciate Professor of Statistics. He will devote the majority of his time to teaching. 

Dr. C. H. Fischer has been promoted to the rank of Professor of Actuarial 
Mathematics in the Department of Mathematics and Professor of Insurance in 
the School of Business Administration, University of Michigan., Ann Arbor, 
Michigan. 



NEWS AND NOTICES 


463 


Dr. Iti. J Gumbel, Professor of Statistics at the New York New School for 
Social Research, has been appointed Consultant to the National Bureau of Stand¬ 
ards and has been awarded a Guggenheim fellowship for finishing a book on the 
theoij' of extreme values. 

Dr. Eugene Lukacs, who has been on leave from Our Lady of Cincinnati 
College and working as a Statistician for the U. S. Naval Ordnance Test Station, 
Inyokern, California, is transferring to the Statistical Engineering Laboratory, 
National Bureau of Standards, Washington, D. C 

Dr. R. B. Leipnik, formerly a member of the Institute for Advanced Study, 
has accepted a position as Assistant Professor of Mathematics at the University 
of Washington, Seattle. 

Mr. Harold 0 Mathisen, Jr., of the Kaiser-Frazer Corporation has been 
transferred from Willow Run, Michigan where he was an Assistant to the Director 
of Sales, to Buffalo, New York, as Regional Credit-Distribution Supervisor 

Mr. Jack Moshman has resigned from the TJ. S. Atomic Energy Commission 
at Oak Ridge, Tennessee, to accept a position as Statistician with the Mathe¬ 
matics Panel of the Oak Ridge National Laboratory 

Dr. D N. Nanda is now acting as Senior Scientific Officer m statistics at the 
Technical Development Estt. Laboratory at Kanpur, India. 

Mr. Shanti A. Vora was awarded at the commencement June 5, 1950, the 
degree of Doctor of Philosophy in Mathematical Statistics from the University 
of North Carolina, Chapel Hill. His dissertation, entitled “Bounds on the Dis- 
tiibution of Chi-Square,’' won the William Chambers Coker Award in Science 
for 1950 granted by the Elisha Mitchell Scientific Society for excellence in re¬ 
search m all the scientific departments of the umversity. He has been appointed 
Acting Assistant Professor in the Department of Statistics at Stanford Um¬ 
versity, California, effective July 1, 1950, where he will be principally employed 
in research on sampling inspection 

Professor Abraham Wald, Chairman, Department of Mathematical Statistics, 
Columbia University, gave a series of lectures on the theory of statistical decision 
functions at the Naval Oidnance Test Station, Inyokern, California, April 3-7, 
1950. Representatives from several organizations and educational Institutions 
on the Pacific coast attended the lectures. 


A copy of the bulletin of the Graduate School of Public Health, University 
of Pittsburgh, has been received at the Secretary’s office. The program of the 
Department of Biostatistics will be of particular interest to readers of the Annals. 
The teaching and research activities of the Department of Biostatistics are aimed 
primarily at the development of methods for the statistical appraisal of the 
health problems of groups: the community, the family, and the special aggregates 
such as the population in industry and in school. 



404 


NEWB AND NOTICES 


The Edticational Testing Serviee is (ilTering fnr ] 051-52 its fourth series of 
research fellowships in psychometricH leading to the Ph.D degree at Princeton 
University. Open to men who are acceptable to the Grudiiate Hchool of the 
University, the tw’o felloivslup-s (>aeh curry u stipend of S2,375 a year and are 
normally renewable. Fellow.s will be engaged in fiarl-timo research in the general 
area of psychological measurement at the offices of the Educational Testing 
Service and will, in addition, carry a normal [irogram of studie.s in the Graduate 
School. Competence in mathematics and psychology is a prerequisite for obtain¬ 
ing these fcllow.sliip.s. Information and applieatioii blanks may be obtained from; 
Director of Psychometric PYllowsliip Program, Educational Testing Service, 20 
Nassau Street, Princeton, New Jenscy. 


Preliminary Actuarial Examinations 
Prize Awards 

The ivinneis of the prize awnrd.s olTcrcd by the Society of Actuaries to the 
nine undergraduates ranking higlic.st on the score of Part 2 of the 1950 Prelimi¬ 
nary Actuarial Examinations arc as follows; 

Firsl Prize of S200 

Mftttuek, Arthur P. . .... Swarthmoro College 

Additional Prizes of SlOO 

Dempster, Arthur P, .... UniverHily of Toronto 

Ilaslain, M. Drcnt. UniverBity of Buffalo 

Iludek, Paul R. .. University of Minnesota 

Jamieson, J. Rao. . .. University of Toronto 

Leff, Milton M. .University of Western Ontario 

Milnor, John W.Prinoelon University 

Reynolds, William F.College of the Holy Cross 

Walter, John R . .University of Toronto 

The Society of Actuaries has authorized a similar set of nine prizes for the 
1951 examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three ex¬ 
aminations : 

Part 1 Language Aptitude Examination 

(Reading comprehension, moaning of words and word relationships, antonyms, and 
verbal reasoning ) 

Part 2, General Mathematics Examination. 

(Algebra, trigonometry, coordinate geometry, differential and integral calculus.) 
Part 8. Special Mathematics Examination. 

(Finite differencos, probability and stalisticB.) 

The 1951 Preliminaiy Actuarial Examinations will be prepared by the Educa¬ 
tional Testing Seiwice and will be administered by the Society of Actuaries at 
centers throughout the United States and Canada on May 18, 1951. The closing 
date for applications is March 15, 1951. 

Detailed information concerning the Examinations can be obtained from: 

The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Illinois 




NEWS AND NOTICES 


465 


New Members 

The following poTsons have been elected to membership in the Institute 
(March 1. 1950 to May 31, 1950) 

Ard, Everett E., B.S (Kansas State Teachers College), Student, Universitj’’ of Michigan, 
1867 Monson Court, Willow Run, Michigan 

Balnbrldge, T. R., B S (Clemson College, S C ), Supervisor, Koda Quality Inspection 
Group, Tennessee Eastman Coiporation, Kingsport, Tennessee. 

Bankier, Janies D., Ph D (Rice Institute), Associate Professor, Mathematics Department, 
McMaster University, Hamilton, Ontario, Canada. 

den Breeder, Jr., George G., B S (Wayne Univ), Student, Wayne University, 459 East 
Grand Boulevard, Detroit 7, Michigan 

Casas, Luis T., Ph.D (Univ of Bogota, Colombia), Professor of Statistics, Umversidad de 
los Andes and Faoultad de Eoonomia Industrial y Comercial del Gimnasio Modeino, 
also Statistician, Compania Golombiana de Seguros and Companie Colombiana de 
Seguros de Vida, Aparlado Nacional No. 2088, Bogota, Colombia. 

Clark, Charles R., B S. (Univ of Michigan), Student, University of Michigan, ISIB West 
Cross Street, Ypsilanii, Michigan 

Dolby, James L., M A (Wesleyan Univ.), Mathematical Physicist, Beldmg-Heminway, 
Inc., 66 Grove Street, Putnam, Connecticut 

Elfvlng, Gustav, Ph D (Helsingfors, Finland), Professor of Mathematics, University of 
Helsingfors, Finland, now visiting Professor, Mathematics Department, Cornell Uni¬ 
versity, Ithaca, New York 

Embody, Daniel R,, M S (Cornell Univ ), Staff Statistician, The Washington Water Power 
Company, P 0. Drawer 144^, Spokane 6, Washington. 

Frazier, David, Ph.D (Stanford Univ ), Research Chemist, Chemical and Physical Re¬ 
search Division, The Standard Oil Company (Ohio), S1S7 Cornell Road, Cleveland 6, 
Ohio. 

Graf, Herman S., B.A (Alfred Univ ), Student, Department of Mathematical Statistics, 
University of North Carolina, 68 Winans Drive, Yonkers 2, New York 

Greenberg, Bernard G., Ph.D. (N C StateCollege), Associate Professor and Acting Head, 
Department of Biostatistics, School of Public Health, Associate Professor, Institute 
of Statistics, Raleigh, North Carolina. 

Grenander, Ulf, Ph D (Stockholm Univ.), Department of Mathematical Statistics, Norr- 
tullsgatan 16, Stockholm, Sweden 

Grosh, Jr., Louis E., M.S. (Purdue Univ ), Research Assistant, Mathematics Department, 
Purdue University, W Lafayette, Indiana. 

Hoffman, Walter, M A (Wayne Univ ), Statistician, Research Laboratory, Childrens Fund 
of Michigan, 2903 Elmhursi, Detroit 8, Michigan. 

Hoffman, Robert G., A.B (Stanford Univ.), Student, University of Michigan, 420 Thompson 
Street, Ann Arbor, Michigan 

Hopkins, George D, B S. (Ohio State), Statistician, Sylvania Electric Products, Inc , 
Ottawa, Ohio, 4^4 N. Jameson Avenue, Dima, Ohio. 

Horowitz, Jacob, B S (Columbia Univ ), Graduate Student, Department of Mathematical 
Statistics, Columbia University, 662 Riverside Drive, New York 27, New York 

Huntsberger, David V., M.S. (West Va Umv ), Graduate Student, Iowa State College, 
Ames, Iowa, 224 Pummel Court, Ames, Iowa 

Kempff-Mercado, Rolando, Lie en Ciencias Boon (Univ Mayor de San Andres), Secretary 
General of Yacimientos Petroliferos Fiscales Bolivianos (Bolivian Oil Field Authority), 
P 0. Box 1283, La Paz, Bolivia 

Kennedy, Muriel E., B Sc. (University of Alberta), Statistician, Special Surveys Divi¬ 
sion, Dominion Bureau of Statistics, 128 Mason Terrace, Ottawa, Ontario, Canada 

Lander, Elmer L., B A (Western Reserve Umv , Cleveland), Student, University of Michi- 

^SAVitl _ A ..... 4>/l dtxa I\ 



400 


NfnVR A.ND NOTICES 


Li-Min, Tang, M.S. (Utiiv. of Midi.J, Ktuclpiit, University of Michigan, 1109 Willard, Ann 
Arlior, Michigan 

Lin, Shao-kung, M A (Louiainna Htnle Univ.), Student, Deparlnicut of lOconoraics, Uni¬ 
versity of Illinois, lA’OS} IK. UrnvcTsily Avenue, Uihana, Illinois. 

Mandelson, Joseph, H.H. (College of City of New York), Mathematical Statiatician, Chief 
Quality Assurance Ilraneli, Inapeetioii Division, Ofllce of Chief, Army Chemical Cen¬ 
ter, Maryland, SO Cellar Ulreet, Bdgeuioud Heights, Man/land. 

Marthens, Arthur S., B S. (Carnegu* Inst, of Tecli.), Mathematical Statistician, Bureau of 
Shijw, Xavy Depaitinent, Wiwhitiglon, D. C., ISW Mtmlicr Street, Wilkinsburg, Penn¬ 
sylvania. 

Masel, Marvin, A M (Columhia Univ.),Engineering .Statistician, (loodyear Aircraft Corp. 
Akron, Chio, Y.M C A , Hoorn S09, 80 Center Street, Akron, Ohio. 

McCune, Duncan C., B A (College of Wooster), Craduiite Assistant, Mathematics Depart¬ 
ment, Purdue University, Lafayette, Indiana 

Meyer, Paul L., B S. (Univ of Washinglon), Itescarch Fellow, Laboratory of Mathe¬ 
matical Statistio-s, University of Washington, OSOO-Sth, M.K., Seattle S, Washinglon, 

Milberg, Stanley, M A, (Columliia Univ ), .Statistician, SSO Tttrrcll Avenue, So Orange, 
New Jersey. 

Miser, Hugh J., Ph.D. (Ohio .State Univ ), Operations Analyst Ueadriuarters, United States 
Air Force, S71S lilatnc Drive, Chein/ Chase IS, Maryland 

Morrison, Milton, M.A (Colunihin Tfiiiv.), Instructor of Mathematics, Stevens Institute of 
Technology, Ifobokon, Kow .feraey 

Mulholland, Hugh P., Pli.I) (Caniliridgc Univ , England), Associate Professor of Mathc- 
miilies, American University of Beirut, Beirut, Lebanon. 

Nelson, A. Carl, M.H. (Univ, of Dolawuro), Inatiuelor in Mathematics, University of 
Dolan are, iN'owark, yi/ura/uihhin, Drluivare. 

Neuwlrth, Sidney I., B.A (N Y. Univ.), Slutislicinn, Biological Ilesearch Laboratories, 
fit'heriiig C'oriioration, SO Orange Street, Itloomfield, New Jersey 

Pierce, James A,, 11,A. (Westminster College, Fulton, Mo ), (Irmluale Assislant, Purdue 
University, S05 Sylvia, Weal iMfaycltc, Indiana. 

Powell, Claude J., ILS. (Umv of Tennessee), Quality Control Engineer, North American 
Huyon Coip , tllS Hattie Avenue, Elizahelhton, Tennessee. 

Rojas, Baslllo A., B iS. (National College of Agiicidlure, Mc'dco), Oriuluatc Student, Iowa 
State College, Statistical Laboratory, Ames, Iowa 

Rosehoom, John H., M,.S, (Dartninutli College), Instructor, Department of Economics, 
Indiana University, Bloomuiglon, Indiana. 

Sandlin, William T., A B (Marshall College, Huntington, W Va ), Independent Sales 
Ilrigincci'i 20 Fairfax Drive, Hunlinglon, HV.st Virginia 

Schmitt, Samuel A., B.S. (Uiiiv. of Chieago), Uesearch Analyst, Department of Defense, 

' Washiiigton, 1). C., iJfW N Rhodes Slreet, Arlington, Virginia. 

Shaw, Richard H., M.S (Purdue Univ.), IteHeiircIi Fellow, Purdue University, F.P II.A. 
513-1 .lirporl Road, West Lajai/elte, Indiana, 

Smith, Hugh F., M S.A (Cornell Uiiiv.), Piofessor of E.\perimentiil Statistics, Institute of 
Sliitislic.s, University of North Carolina, Sox 5^57, Culleye Station, Raleigh, North 
Carolina. 

Sommers, Lysle D., B,S. (Bowling Green Stale Univ,), Sampling Assistiiiil, Survey Re¬ 
search Center, University of Micliigan and Graduate Student, 1BS2 Leeds Court, Willow 
Ran, Michigan 

Springer, Clifford H., M Sc (Purduo Univ ), Instructor, Dopartment of Mathematics, Ro- 
BG.arch Assistant, Statistical Livhorulory, Uoc.itation Building, Purduo University, 
Wc.^l LarayotLc, Indiana 

Stearman, Robert L., M S (Oiegnn Slate College), Teaching Fellow, Department of Mathe- 
matie.s, Oiogon State College, Corvallis, Oregon. 



REPORT OF CHICAGO MEETING 


467 


Tingey, Fred H., M S (Univ of Washington), Research Associate, University of Washing¬ 
ton, S310 Goldcndale Place, Seattle B, Washington 
Tipton, Lamar B., M A. (Columbia Univ ), Statistical Clerk, Standard Oil Co of New Jer¬ 
sey, 8 W. 604th Street, Shnnks Village, Orangeburg, New York 
Topp, Chester W., M A (Univ of Illinois), Associate Professor of Mathematics, Penn 
College, 1634 Comton Road, Cleveland Heights 18, Ohio. 

Vora, Shantl A., M Sc (Bombay), Student, Department of Mathematical Statistics, Uni¬ 
versity of North Carolina, 310 A, Phillips Hall, Chapel Hill, North Carolina 
Willis, Myron J., AM (Indiana Univ ), Instructor of Mathematics, Purdue University, 
Statistical Laboratory, Lafayette, Indiana. 


REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 

The forty-third meeting and the first regional Mid-western meeting of the 
Institute of Mathematical Statistics was held on the campus of the University 
of Chicago, Chicago, Illinois on Friday and Saturday, April 28 and 29, 1950. 
The morning session on April 29 was held jointly with the American Mathe¬ 
matical Society. The following forty-six members of the Institute were registered 
as present: 

K J. Arnold, Max Astrachan, Reinhold Baer, Alvm G Brooks, I W Burr, P G, Carl¬ 
son, Herman Chernoff, P S Dwyer, H P Evans, J S Frame, Mary Goins, R D Gordon, 
John Gurland, P R Halmos, P C. Hammer, W L Hart, M A Hatke, P. E Irick, Howard 
L. Jones, Leo Katz, J P. Kelly, W M Kincaid, L, A Knowler, Tjailing Koopmans, F C. 
Leone, F W. Lott, W G. Madow, A. M. Mark, John W. Mauohly, Kenneth May, Duncan 
C. McCune, Cyril G Peckham, G B Price, P.R Rider, Norman Rudy, L J Savage, G.R 
Seth, Richard H. Shaw, Jack Sherman, M D Springer, Robert G D Steel, Z Szatrowski, 

J V Talaoko, R. M. Thrall, L M Weiner, M E Wescott. 

Professor Lloyd A. Knowler of the University of Iowa presided at the Friday 
afternoon session. The program consisted of the following invited papers: 

1. Why and Where Should Courses in Statistics Be Offered to Engineering Students? M, E. 
Wescott, Northwestern University 

2 What and How Statistics Should be Taught to Engineering Students I W. Burr, Pur¬ 
due University. 

Following this session a tea was given by the Department of Mathematics of 
the University of Chicago 

Professor John Gurland of the University of Chicago presided at the Saturday 
morning session. This session was held jointly with the American Mathematical 
Society. The program was as follows: 

1. The Distribution of the Quotient of Ranges in Samples From a Rectangular Population 
Paul R Rider, Washington University, St, Louis, Missouri 

2. A Oeometric Method for Finding the Distribution of Standard Deviations when the Sam¬ 
pled Population Is Arbitrary (Preliminary report) Paul Irick, Purdue University. 

3 Probability of a Correct Result with a Certain Rounding-off Procedure W S Loud, 
University of Minnesota 

4. Analysis of a One-person Game. (Preliminary report) W. M Kincaid, University of 
Michigan. 



468 


EBPORT OP CHICAGO MEETING 


Professor W. G. Madow of the University of Illinois presided at the Saturday 
afternoon session. The program consisted of the following invited papers: 

1 Correlation and Regression mlh Matrix Factorisation, P. S. Dwyer, University of 
Michigan. 

2. The Identification of Structural Characteristics. Tjalling Koopmans, University of 
Chicago, and Olav Peierapl, University of Oslo, Norway. 


K. J. Arnolo, 

Associate Secretary 



THE PROBLEM OF THE GREATER MEAN 

By Raghtj Raj Bahadur and Herbert Robbins^ 

University of North Carolina 

1. Introduction and summary. Let xi, xj be normal populations with means 
mi, m 2 respectively and a common variance the parameter point 
£ 1 ) = (mi, m 2 Iff) which characterizes the two populations being unknown, and 
let il be an arbitrary given set of possible points w. Random samples of fixed 
sizes ni, are drawn from xi, X 2 respectively, giving the combined sample 
point V = (xii, a;i 2 , ■ ■ , Sin, , a; 2 i , 3 : 22 , ■ , Xin,). For reasons which will be 
made clear later in connection with practical examples, any function f{v) such 
that 0 < /(r) ^ 1 is called a decision function, and for any such f{v) the risk 
function is defined to be 

(1) >■(/1 “) = max [mi, m 2 ] - miE[f \ w] - m2£'[l - f\u]>0 , 

where E denotes the expectation operator A decision function /(y) is said to be 
(a) uniformly better than /(a) if r(f\ui) < r(f\oi) for all w in fi, the strict in¬ 
equality holding for at least one oj, (b) admissible if no decision function is 
uniformly better than f(v), and (c) mtmmax if 

sup [r{f I 0 ))] = inf sup [r(/ | u)]. 

The “problem of the greater mean” is, for any given n, to determine the mini¬ 
max decision functions, particularly those which aie also admissible. Special 
interest attaches to the case in which there exists a unique minimax decision 
function f(v) (in the sense that if f{v) is any minimax decision function then 
f(v) = f{v) for almost every v in the sample space); such an f{v) is automatically 
admissible. 

The problem of the greater mean is, of course, a special problem in Wald’s 
general theory of statistical decision functions [ 1 ]. Our results will, however, be 
derived by very simple direct methods which make no use of Wald's general 
theorems. 

We cite without proofs a few examples in order to show how strongly the 
solution of the problem of the greater mean depends on the structure of fi. In 
each case the minimax decision function is a function only of the two sample 
means xi, . 

(i) Let consist of the two points (a, b: ff) and (b, a: ir), with a < h Then 
1 if nifi — UiXi > (ni — 712 ) (a + b)/ 2 , 

0 otherwise, 
is the unique minimax decision function. 

^ This work was supported in part by the Office of Naval Research. 

469 




470 


liAGHTJ HAjr BAHAUUIt AND HKItnERT ROBBINS 


(ii) Let n" eoasi.st of fclu; two points (c -|- h, c: o-) and (c — h, c: a), with h > 0 
Then 


(3) 


/"(u) 


1 if > c, 
P otherwise, 


is the unique miniraax decision function. 

(iii) Let fi'" consist of the three points Q, — i:l), Q, a^l). ( —I,—i:l), and 
let Til ~ n% ^ n. Then 


(4) 




1 if 

0 otherwise, 


< 


where X is a certain definite constant, is tlie uni(iue ininiinax decision function. 

The parameter spaces of two or three points specified m these examples are 
rather trivial, but in fact tlie corrospondinK decision functions (2), (3), (4) re¬ 
main the unique minimax .solution.i of the decision problem with respect to 
much more general parameter si)acc.s. Thus, for example, it is clear that f*{v) 
will remain the unique minimax decision function with lespcct to any 0 which 
contains fl' and is such that 


sup [r(f* I <o)] = sup [rf/* | «)). 


Corresponding remarks apply to fliv) and/^^Ca). 
Wlien ni = nj, (2) reduces to 


(5) 


f(v) = 


1 if ii > ^2, 
p otherwise. 


This decision function is of particular interest when both the means mi, mi are 
unknown. It will be shown that whether or not Uy = rii, fiv) is the unique 
minimax decision function under certain conditions on 0 which are likely to 
hold in practice, at least when both ni and n 2 are sufficiently large (Theorem 3) 
Likewise, fliv), which is the analogue of /(») when one of the means inh) is 
known exactly, is apt to be the unique minimax decision function in such cases, 
at least when ni is sufficiently large (Theorem 4). These results on fiv) and 
/S(y) form the main results of the present paper. 

So much by way of a general summary. We shall now give a practical il¬ 
lustration (another is given in Section 3) to show how the problem of the greater 
mean arises in applications. 

Suppose that a consumer requires a certain number of manufactured articles 
which can be supplied at the same cost by each of two sources iri and ir 2 . The 
quality of an article is measured by a numeiical characteristic x, and it is known 
that in the product of ir,, x is normally distributed with mean w, and variance 
</, but the values of these parameters arc unknown. The consumer has ob¬ 
tained a random sample of rii and ih articles from xi and 7r2 respectively, and 
has found the values of x to be (xn , Xu, ' , Xin, , Xn , xa , ■ • ■ , xm,) = *'■ 

What is the best way of ordering a total of N articles from the two sources? 



PROBLEM OP GREATER MEAN 


471 


The usual statistical theory, which confines itself to estimating the unknown 
parameters and to testing hypotheses of the form Ho{mi = m 2 ), has at best an 
indirect bearing on the problem at hand. We therefore adopt Wald’s point of 
view and investigate the consequences of any given course of action. If the 
consumer orders fN articles from tti and (1 - f)N from ira, where 0 < / < 1, 
then the expectation of the sum of the a;-valucs in the articles he obtains will 
be N{mif - 1 - wi 2 (l ~ /)) The maximum possible value of this quantity is N 
max \mi, wia], and the “loss” per article which he sustains may therefore be 
taken as 

W (o), /) = max [m,i, mj] - m,f - ?n 2 (l - /) > 0, 

where w = (mi, m 2 : <r) is the true parameter point. 

The consumer wants to choose / so as to make W as small as possible. If 
he knew mi to be greater, or to be less, than m 2 , then by choosing / = 1 or 0 
respectively he could make W = 0 But since he does not know which m, is 
the greater he will presumably choose / as some function of the sample point v. 
Suppose, therefore, that a “decision function” J{v), such that 0 < f(v) < 1 but 
not necessarily taking on only the values 0 and 1, is defined for all points v m 
the sample space and that the consumer sets / = f{v),^ In repeated applica¬ 
tions of this procedure, the “risk” or expected loss (a double expectation is in¬ 
volved : the expected loss for a given / and the expected value of / in using the 
decision function f{v)) per article is given by (1), and the consumer will try to 
find an f{v) which minimizes this risk. Since the value of the risk depends on as 
it is necessary to specify which values of os are to be regarded as possible in 
the given problem; let the set of all such u be denoted by S2. If the consumer 
agrees to adopt the “conservative” criterion of minimizing the maximum pos¬ 
sible risk, then the statistician’s problem is to find the minimax decision func¬ 
tions in the sense defined above We have given the solutions of this problem 
for certain types of parameter spaces. The reader will observe that each of the 
m inimax decision functions (2), (3), (4) was of the “all or nothing” type, with 
values 0 and 1 only. (Whether this remains true for every S2 we do not know.) 
By using one of these decision functions in a given instance one arrives at either 
the best possible decision or the worst. The attitudes of douht sometimes as¬ 
sociated with the non-rejection of the hypothesis Ifo(mi = m 2 ) are therefore 

“ One might say that the consumer should choose f in the light of what he can infer from 
V about the m, . But this formulation as a problem in ordinary statistical inference (estima¬ 
tion and testing) is not relevant and may be misleading For example, a plausible f{v), 
based on the idea that the problem is one of testing hypotheses, is as follows • “Perform the 
two-tailed i test of Hc(rni = vii) at the five per cent level If ffo is rejected set / = 0 or 1 
according as £, is less than or greater^han ii . If Ho is not rejected set / = i." Another 
/(w), based on the theory of estimation, according to which the x, are the “best” estimates 
of the m, , is as follows "Set / = 0 or 1 according as x, is less than or, greater than it 
Actually, the latter procedure is, from the remarks above concerning (6), the “best” in 
a certain definite sense and under certain conditions, but this fact does not follow from the 
usual theoiy of estimation 



472 


KACIIII IIAJ BAHADUR AND IIKllBEKT ROBBINS 


irrelevant to the problem of the greater mean in the examples cited. (Cf. foot¬ 
note 2; also Example 1 in Her lion 3.) 

The ri.sic function (1) i.s hut one of a general clas.s li of risk functions, to be 
defined in Section 2, which are associated with the problem of the greater mean. 
The most important mcmhcr.s of R are (1) and 

(fl) r(/1 w) = /^incorrect dceiHioii using/(a) | w), 


Avhere "mi < ffij” and "Mi > wa” arc the two po.s.siblo decisions. The risk func¬ 
tion (6) IS relevant to appheationa of a purely “scientific” nature in which the 
statistician is asked merely to give hia opinion as to which population has 
the greater mean. Although the problem of constructing a suitable decision 
function for (6) is akin in spirit to the problems considered in the now classical 
Neyman-Pearson theoiy of .statistical test.s, no satisfactory solutions seem to 
be available. It is easy to see, however, that (1) and ((i) are quite similar. Of 
course, in the case of (1) a decision function /(a) may take on any value be¬ 
tween 0 and 1 inclusive, while for (6) wo allow only functions wliich take on 
only the values 0 and I, corresponding respectively to the decisions “rth < rtii” 
and "mi > mj". We then have for any such /(«), 


(O') 


f(I\w) = 


{P(Jiv) = 11 w) = E\f I w] if Wi < m 2 , 

P(J{v) = 0 1 w) = JS[1 - /1 w] if mi > W 2 , 


[O if mi = nii 

and by comparison with (1) wo see that r(/1 w) = 17ni — mi | f(/1 w) for all w, 
Now, in the three examples (i), (ii), (lii) cited above the unique minimax decision 
functions happen to take on only the values 0 and 1, and | mi — m* | is constant 
on each of the respective parameter sets. It follows that (2), (3), (4) are also 
the unique minimax decision functions relative to (6) and to O', 2", 2"' respec¬ 
tively. The remarks above following Example (iii) also remain valid for the 
risk function (6). 

We conclude this section with a remark on the methods of this paper Any 
decision function relevant to (0) is equivalent to a test of the hypothesis Htimi < 
ms) against the alternative Hi{mi > rth), the region {v.f{v) = Ij being the 
"critical region.” Hence the Neyman-Pearson probability ratio method can be 
used to obtain the unique minimax decision function with respect to (6) and 
an 2 consisting of two (or more) points, and the result carries over to more 
general types of 2 in the manner already indicated. It turns out, however, that 
the dominant properties of the probability ratio tests are not confined to 
the class of teats alone, but extend to the class of all functions /(«) such that 
0 < /(w) < 1. This result (Theorem 1) enables us to solve the problem of the 
greater mean for the risk function (1) as well for (6). The reader who is inter¬ 
ested in applications may turn to Section 3. 


2. Theorems. We require the following slight generalization of a well-known 
result of Neyman and Pearson [2] 



PROBLEM OP GREATER MEAN 


473 


Theorem 1. Let 4>{v), • • , ^r{v) be summable Junctions defined on 

a measure space E with points v and measure n, u{E) < °o, let Ci , ■ ■ ■ , Cr he 
arbitrary constants, and let A ^ E be such that 

\v € A implies J){v) > ^ 


V e E — A implies ^{v) < S 0 , 4 Jo). 


j (/), dp = a^ 


and let f(v) be any measurable function such that 
(9) 0</(i.)<l 

and such that 


(i — 1, • • • j r)j 


(10) f f<t>i dp = a, {i = 1 

Je 

Then 

(11) f f<t> dp <[ (i> dp. 

Je Ja 

Proof. J f4> dp = dp + Jj> dp 

^ j f<t> dp J f dp b 

= [ f<t> dp + c, [ f<hi dp 

Jji 1 Jj-i. 

= I f<t> dp + i f4>^ dp - f<t>i dp 

= j f<t>dp + ^ c, j^o. — f<t>, 

= j ftp dp + Ct [L (1 -/)0i<^Mj 

= J <fi dp — J (1 — f)<h dp + — f) dp 

= 1 </> dp + (1 - f) (E cAi - dp 

^ f <t> dp ^ 

J A 


(i = 1, ■ • , r). 


by (9), (7), 


by (10), 


by (8), 


by (9), (7). 



474 


ItAGHU IIAJ BAKABUR AND HBRUKHT liOBBlNS 


Note 1. If the condition 
(12) ^^y:0(y) = 23 c,i^,(y)^ = 0 


holds, then in order that the equality hold in (11) it is necessary and sufficient that 
(13) /(t») = Xa(v) a.e. (n), 

where x^iv) is the characteristic function of the set A, 

I I if V e A, 

0 if V e E ~ A. 


Xa(v) = 


Proof. The sufficiency is obvious. To prove the necessity ive observe from the 
proof of Theorem 1 that for equality to hold in (11) it is necessary that 


/(«) {<t) (v) - Jf, c.0.(y)) = 0 


and that 


These relations and (12) imply (13). 

Note 2. If relations (10) are replaced by 


(1 ~ fiv))(<t>{v) - 2 cMv) 


(10') 


/ f<hidn < a, 


a.e. (p) in E-A, 


a.e. (y) in A. 


(t = 1, ■ • r), 


and if each of the constants c, is non-negative, then Theorem 1 and Note 1 remain 
valid. 

Theorem 1 has applications to a number of decision problems of a certain 
type. In the present paper We consider only the “problem of the greater mean” 
for two normal populations with a common variance ir', where at least one of 
the means , nh is unknown. The following assumptions and definitions will 
be valid henceforth. 

(A) En is the N = Ux A- n^ dimensional sample space of points 
V = (xii, a:i 2 , • • • , xin, ] X}i, X 2 i, • • • , A measurable function f{v) de¬ 
fined for all V in Ey is a decision function if 0 < /(y) ^ 1. fi{v) s / 2 (y) means 
fi(.‘‘>) — Mo) for almost every y in jSj^r. 

(B) 12 is a given set of points o> = (Wi, mj: o-), o- > 0. Given to in 0, the prob¬ 
ability measure in Ey is that generated by the distribution function 

K{v\ci) = 

1-1 i-\ 

where 


0 {x) = (2ir)“* r 


e '‘^'^du. 



PROBLEM OF GREATER MEAN 


475 


Given any function cfi = 0 ( 11 ) for which the integial exists we write 

E[cj> I oj] = / (j)(j)) dK{v I (S). 

Jbh 

(C) Let 7(“) = (ffi 1 Oi) be a function defined for all oj in fi, with values in 
Ei, and such that 

(14) ?n, < m, implies g, < 3 , = 2 ). 

Given p, 0 < p < 1, we define 

W(o), p) = max [31 , 32 ] - ffip ~ 32(1 - p), 

and given a decision function f(v) we define the nsk function 

r{f I oj) = ElWio,, /)| 6.] = F(co, i;[/1 a>]) 

= max [31,32] - 3iS[f | w] - 32i5/[l - /1 w] 

The class of risk functions (15) corresponding to all functions 7 ( 0 ) which satisfy 
(14) is denoted by B (The two most "important members of R are (1), with 

7 ( 01 ) = (mi, mi), 


and (6), "with 


f(0, 1) if Ml < mi, 
7 (co) = j (1, 0) if mi > m 2 , 
[(0, 0) if mi = mt 


The risk functions (1) and (G) appear in the examples in Section 3.) Throughout 
this section r{f | cj) will denote a fixed but arbitrary member of R. We shall use 
the notations 

Hoi) = 1 3i - 321 , 

d(w) = f — (mi - mi)/<T, 

\ni Hi/ 


= n/ X) ~ 

j-i 

Tiibohbm 2. Lei oii ~ (mi , m 2 '■ <t) and 012 = (mi > ^2 ■ o") paTameter points 
such that 

d(oii) < 0, dM > 0, Hoii)Ho)i) > 0. 

For any\,-'x> < X < «, lelMv) be the characteristic function of the set 

(16) = {v'.Uiim ~~ mi)5i + nilpi — mi)x2 > Xd}. 

Then . 

(i) Corresponding to any decision function fiv), there exists a X such that 

rifi I wi) = r(/1 «i), rifi. j wj) < rij 1 ^ 2 ); 



47(3 


JIAGIIU HAJ BAHADUR AND HERBERT ROBBINS 


(he inequality is strict unless fiv) = /x(u). 

(ii) Given any X, if f{v) is a decision functon such that 

riS I < r(J\ I <e,) {i = 2), 

then 

fiv) ^ fxiv). 

(iii) There exists a unique c such that 

(17) rife I wi) = rife I wj) “ /# say, 
and for any decision function fiv) we have 

(18) B < max [r(f | 6.0, r(/ | 6.0]; 

the inequality is strict unless fiv) = /o(u). It follows thalfdv) is the unique minimax 
decision function corresponding to the two-point parameter space = (ui, wO 
Proof ^ (a) Let </i(y), .^i(y) bo the joint frequency functions of the sample 
point V corresponding to the parameter points 6.2 , u, respectively. It is readily 
seen that for any X there exists a unique constant ri(X), 0 < Ci(X) < «, such 
that 

Ax = {t):^(i;) > ci</.i(y)J 

(ci(— «>) = 0, ^(w) = «). Moreover, since uj ^ wt, 

p(v;d>iv) ~ Ci<fii(v)} = 0, 

It follows from Theorem 1, Note 2, that if fiv) is any decision function such 
that 

SLf I ^i] < Blfx 1 6.1), 

then 

B[fiwi] < mi on], 

and the strict inequality holds unless fiv) s fx(v). 

(b) It IS clear from the definition (16) that for any fixed parameter point w 
the functiop 

ilLfx I 6 .] = P(Ax 1 o>) 

is continuous and strictly decreasing from 1 to 0 as X varies from — » to + ». 

(c) For any decision function/(r) and any parameter point w we have by (C), 

rif I w) = max [gi, gi] - giBlf | u] - giB]! - f I w]. 

Hence 

|r(/1 wi) = h(ws)B[f I «,], h(a)j) > 0, 

V(/ I 012) = h(wi)B[l — /1 6)2], h(o)i) > 0 , 

* Theorem 2 (as also Example (iii) of Section 1) could be derived from Wald’s general 
results on the completeness of the class of Bayes solutions of statistical decision problems. 



PROBLEM OF GREATER MEAN 


477 


Since for any decision function f{v), 0 < £?[/1 wi] < 1, we can by (b) choose X 
so that 

(20) 

and by (a) it follows that unless/(a) = 

(21) -B[/x I wj > EU I 0)2] 

(i) . Follows from (19), (20) and (21) 

(ii) . Follows from (19) and (a). 

(ill).‘(17) follows from (19) and (b). Then (18) follows from (17) and (ii). 

Theorem 2 provides the solution of any problem of the greater mean when 0 
consists of just two points wi, < 02 . For, the problem is trivial unless d(a)i) d(t 02 ) < 
0 and h(o3i)h((Di) > 0, and m the non-trivial case the unique minimax decision 
function is fc(v) defined by (17). Moreover, it follows at once from the defini¬ 
tion that if /(«) IS the unique minimax decision function with respect to some 
parameter set fl, then it remains so with respect to any fi such that Q 2 H and 

sup [r(J I o))] = sup [r(/1 to)] 

<j«Q tf tO 

By taking sets 0 which consist of two points, Theorem 2 can therefore be used 
to obtain sufficient conditions for an J(v) = fe(v) to be the unique minimax 
decision function with respect to a quite general (It is clear that results 
analogous to Theorem 2(iii) but pertaining to more than two parameter points 
can be derived from Theorem 1, and that these results can be exploited in a 
similar way. An instance of this procedure where H consists of three points will 
be given at the end of this section.) 

The theorems which follow exploit Theorem 2 in this way to obtain conditions 
on n under which the decision functions f{v} and fl{v) defined by (5) and (3) 
are minimax. We consider f(v) first. From (C) we have, after a simple compu¬ 
tation, 

(22) r(f I w) = A(w)-(j(— I d(«) I). 

Theorem 3. Suppose that there exist sequences {w*}, {«*} of points uk = 
(mik, iThk : O'*), m = (uik, ’■ O’*) in Si such that 

(i) lim r(f 1 £ 0 *) = sup [r(/‘’ | w)] (p^O, <»), 

M—tte ueO 

(ii) d(o)*) = — h(o>k) = h{uk), and niinu + Uimik = riijui* + rhutkfor 

every fc = 1, 2, ■ • • . 

Then f(v) is an admissible minimax decision function. If there exist 
0)0 = (mi, m 2 : O'), 0)0 = (mi , : o') m fl satisfying ( 1 ) and (ii), then f(v) is the 

unique minimax decision function. 

Proof. By (22) and (ii), 

(23) r(f I oik) = \ «*) for every fc. 



47g 


RAGIttr RAJ BAHAnUB AND HERBERT ROBBINS 


Without loss of generality, we may aflaumc the two bcquences to be so chosen 
that h{oik) = h{o)i) > 0 for every k. Then, by interchanging corresponding 
members if necessary, we may asfiume that 

(24) d(uji) = ~ d(wfc) < 0 for every k. 

Consider the two points w*, w* in ft with arbitrary but fixed k. Writing w*, 
for wi, wj respectively, and using ronditioiw (ii), a simple calculation shows 
that the set defined by (10) is 

(25) Ak = (y:£i ~ > L], 

L being a strictly increasing function of X. 

Choose and fix an arbitrary decision function/(a) f{v). Comparing (5) and 

(25) , it follows from Theorem 2(iii) and (23) that 

(26) r(f 16 j*) = r(f \ <oi) < max [r(/1 w*), r(f | 4)]. 

Clearly, f(v) cannot be uniformly better than f(v) in 0. Again, from (26), 

(27) rif I oj*) < sup [»•(/1 w)], 

so that, since k is arbitrary, 

(28) supK/" I w)] = lim r(/“ | m) < HUp [r(/ | co)]. 

Since f(v) ^ f(v) in the preceding argument is arbitrary, w'o have shown that 
(a) no /(y) can be uniformly better than f{v) and (b) sup [r(J° [ co)] = inf sup 

u / w 

[r(f I co)], i.e. that/(a) is admissible and minimax. The last part of the theorem 
follows upon setting co* = <oo in (27). This completes the proof of Theorem 3. 

The conditions on Q for f(v) to be the unique minimax decision function may 
be writted as follows: 

There exist coo = {mi, mi : <t), wo = {fii, ni : it) in U such that 

(i) r{f I coo)(=r(/‘’ I uo)) ~ sup [r(/® j co)] {^0, «), 

<aeO 

(29) (ii) fn = mi + (mi -mi), m 

\Wi + Ui/ \ni + Wo/ 

(iii) h(coo) = hioi'o). 

For the important risk functions (1) and (6), (20)(ii) implies (29)(ni) (i.c, h(co) 
depends on | mi - Wj | alone). Moreover, wlicn wi = Wj , (29)(ii) becomes pi = 
TTh ,li 2 = mi. Thus for (1) and (6), whonni = ni the conditions (29) reduce simply 
to the condition that at least two points in St at which the risk for /'(a) is masimum 
he image points of one another in the plane {co: Wi = mo]. In particular, it follows 
that if Til = no and if the given set fi is “symmetric” in the sense that whenever 
(mi, mo: o) is in 0 then (mo ,mi : it) is also in fi, thenf(v) is the unique minimax 



PROBLEM OF GREATER MEAN 


479 


decision function provided that it attains its maximilm risk in Q, the risk function 
in question beging (1) or (6) There are obvious modifications (involving two 
sequences of points in J2) of these remarks which assert that fiv) is at least an 
admissible minimax decision function in case f (w) does not attain its maximum 
risk in fl. 

We shall now state the result analogous to Theorem 3 for the case when one 
of the means is known exactly, say m 2 = c. The decision function/°(!;) is defined 
by (3). 

Theorem 4. Suppose that there exist sequences {cot}, (“it} of points uk = (c Ot, 
c: ffk), cofc = (c — Uk, c: <7t) in fi such that 

(i) lim r(fl I cot) = sup [r(fl | co)]. (p^O, °o) 

(ii) him) = h(cjk) for every k = 1, 2, • • •. 

Thenfl(v) is an admissible minimax decision function. If there exist coo = (c + a, 
c: cr), coo = (c — a, c: ff) in U satisfying( 1 ) and(ii), thenfl{v) is the unique minimax 
decision function. 

The proof (based on Theorem 2(iii)) is similar to that of Theorem 3 and will 
be omitted. Note that for the risk functions (1) and (6), condition (ii) is auto¬ 
matically satisfied. 

The reader will have observed that results which may be obtained from 
Theorem 2(iii) in the manner of Theorems 3 and 4 will assert the optimal char¬ 
acter of decision functions which are characteristic functions of sets of the type 
(u: axi -f- bxi > c}. The following example, cited as Example (iii) of Section 1, 
shows that for arbitrary the optimum decision function need not be of this 
type. 

Suppose that ni = = n, that fl consists of the three points 

“0 = (i) ~ 2 : l)i “1 “ (f) “2 = (~t) ~2 - l)j 

and that the risk function under consideration is given by (1) or (6). Then the 
unique minimax decision function is f**{v) given by (4), where X > 0 is deter¬ 
mined by 

(30) ^[1 - f** 1 “ 0 ] = E\f** 1 “ 1 ]. 

The proof follows. /**(u) is the characteristic function of the set (a: <l>{v) > 
ci<hi{v) + Cichiv)}, where <t>, 4>i, <h are the frequency functions of the probability 
distributions in Ein corresponding to the parameter points “ 0 , “ 1 , “2 respective y, 
with Cl = C 2 = e^/X. Since for all X > 0, 

E\f** I «i] = EU** I “ 2 ]. 

and since a unique X > 0 satisfying (30) certainly exists, it follows (cf. (19) and 
(C)) that 

- r{f* I a>o) = r(f * | “ 1 ) = rij** I “ 2 ) == B, 



480 


RAOHIJ RAJ BAHAlrtlH AND HKRBEIiT IlOTIBING 


say, Let/(«) be any decision function ^ f**(.v). We sliall show that 
(31) B < max [r(f I cjo), r(J | an), r(J | tja)]. 

Suppose not. I’lien 

r(f I 0 .,) = A’f/I 0 ),] < E\r I «.] =- r(!** \ u>0, 

rU I < 0 ,) - K[j I Oh] < A’f/** 1- r(J** \ o^). 

Then, by Theorem 1, Note 2, \\v. mu.st have Elf | too] < E\f** ] too], so that 

r(/| uo) = 1 - A[/| tool > 1 - W* I = r(r* I o^o) = B, 

contrary to hypoLiicHis. Hence (31) hohl.s, and .since /(e) fd f**(v) is arbitrary 
our assertion is proved, (Note that 

rCf® 1 too) = r(/“ j toi) - r(/‘' ] toj) 

also, so that f**(v) is uniformly better than f(v) in 0.) We remind the reader 
that f**(v) remain,s the uni(|uc minimax decision function with respect to (1) 
or (G) and any f2 wliich contaias too, toi, , and is such that sup [r(/** I to)] = B. 

utU 

Whether a sot il satisfie.s the bust condition will in general depend on whether the 
risk function in tiuestion is (1) or (G). 

3. Examples and discussion. In this section we shall discuss the relevance of 
Theorems 3 and 4 to two specific problems of the greater mean. The e-xamples 
given arc purely illustrative and the reader will readily construct others in which 
the statistician is faced with similar problems of decision. 

Example 1, A farmer F has tested two varieties ri , n of grain in a field 
experiment in which n, plots wore assigned to ir,, f = 1,2, all plots being of equal 
area. The plot yields obtained were Vn, Vn, ■ ■ ■ , ymi and yn , yn , ■■■ , J/ 2 », 
bushels respectively F gives this data to a statistician S for analysis. F is willing 
to assume that the yields per plot for each of the two varieties are normally dis¬ 
tributed with unknown means and a common variance, also unknown. 

F says ho is particularly interested in whether the two varieties are “significantly 
different.” 

S is well aware that F’s interest in the varieties is not purely scientific—that 
is to say, F did not perform the field experiment for the sole purpose of estimating 
the unknown parameters or testing hypotheses concerning them. S also knows 
that it is very unlikely that m is equal to m . 

Suppose that in fact F wishes to decide which variety ho should use next 
year on his land in order to make the maximum possible profit, and is afraid 
that if he were to act as if the observed mean yields j/j , ya were the true popula¬ 
tion mean yields, he might make a gross error. So F is willing to compromise 
between the two varieties (that is, he will assign some fraction / of his land to 
iTi and the rest to itj) in case S declares that there is no.evidence of the two varie¬ 
ties being different. ^ 



PB.OBLEM OP GREATER MEAN 


481 


If this is the case, S should ask F how much it costs him to use it, and the 
price at which he expects to sell his grain. Supposing that these quantities are 
Oj dollars per acre and b dollars per bushel respectively, and that the area of each 
plot in the field experiment was c acres, S will set 

m, = expected profit per acre in using variety t, 

= (b/c)fLt — Gi dollars {i = 1| 2), 

0 ) = (mi, m 2 : ff)j 0 -“ being the variance of the profit per acre 

in using ir, = 1, 2), 

y(oi) = (mi. m 2 ) (see Section 2, (C)), 

n t 

= (&/c)2/.) - a., -2 . v=ixn,- ■,XinX 

j=i 

so that rCf 1 0 )) is given by (1) and is equal to the expected loss (in terms or profit 
per acre) incurred by using the proportions f(v), 1 - /(w) of the varieties xi, ira 
as compared with using the variety with the greater mean for the whole of the 
land Then if S is satisfied that the set of possible points co satisfies the condi¬ 
tions of Theorem 3 he should recommend that F use in alone if xi > Xt, and 
■K 2 alone if $2 > xi, this being the safest procedure in the sense that it is the 

minimax strategy (cf. Example 1 in [3]). 

We shall illustrate by a simple example the obvious method of verifying 
whether f{v) is the minimax decision function for a given 12. We have by (22), 
using the risk function (1) obtained by setting y{u) = (mi, mj), 

(32) r{f 1 oj) = /i(a))(?(— 1 d{u) j) 

= I mi — m 2 1 G{—\ - d-) I mi — m 2 1 /v). 

' ' \ni riz/ 


Now suppose that 
(33) 


I i 

U = {o}-. a — ^ < mi < a + 


- i < m2 < b + ffo - P < <x < ^ > 1 ® 


where a. b, I, <ro, p(>0) are certain constants. By (32), the maximum risk occurs 
at some points in 12 for which a = v<s. We have 

(34) r(/° U = <ro) = <ro -[a^GC-a;)], 


= x{o>) = (-+-') 1 wi - m 2 1 /<ro. 

' \ni n2/ 


where 



482 


IIAOIIU RAJ IlAHADtIR AND lIBHBERT ROBBINS 


If a = 6 and ni = nj wp seo from the remark following (29) that /(y) ia the unique 
minimax decision function. Suppose therefore that a 7«^ 6 or uj or both. 
Now 


(35) .sup [j:Cr(-a:)l = j;qff(— jto) - .1700 (approx.), 

S 

where aio = 7518 (approx.). If mi, Ws were unrestricted, rf/® | »■ = jo) would 
he a maximum when I mi — ma | =“ , by (34) and (35). Hence/(y) 

\»i ws/ 

will lie the unique minimax decision function if these t^vo lines intersect the square 

a — " < TRi < a + h — 5,- < mj < h i in such a way that at least two 
2 2 2 2J 


points lying on tliesc lines and in the sciuare .sal i.sfy (29) (ii). This will be the case if 

Rl — 712 I ~\ 


(30) I > max 
where 


I a — h I + yo, max (| a “ 5 | , i/o) T 

\»i th/ 


Til d” Tli 


ya — XaCQ 


Wo have assumed that f > | o — 5 1 , for otherwise either mi < ttii or rm > mi 
for all u m fi, and there i.s no problem. It is therefore clear that for ni and ni 
sufliciontly large, f(v) ivill be the unique minimax decision function. That (36) 
is not a very strong rociuircment may be seen by setting a = b, rii = 2 n 3 , in 
which case (30) reduces to 


Z > Co 



(approx.). 


We remark that f(v) remains the unique minimax decision function for any 
Til, Til “when 1 = 00 ” so that 12 is given by 


(33') 12 = {o: — 00 < 7Ui < 00,- 00 <, mi < oo; o-o — p < y < o-o}. 


It is of interest to consider the “one sample” cose when one of the means is 
known, say OTz = c. This will be the case (approximately) if tti is a standard 
variety which has been m use for some time and in is a new variety. The analogue 
of the parameter space discussed above is then 


(37) 


By using Theorem 4 it can be seen thaty^(y) as dehnod by (3) is the unique mini¬ 
max decision function if c = a or if c is not necessarily equal to a, but 

(38) t_|„-e|>«(iy, 

where Xo is given by (35), Since the left-hand side of (38) is positive, it is clear 
that fo(v) will be the unique minimax decision function with respect to (37) if 



PEOBLBM OP GREATER MEAN 


483 


ni is sufficiently large. Note that fc(v) is the unique minimax decision function 
for any ni when I = oo and 12 is given by 


(37') 12 = {co: m 2 = c, — =0 < mi < oo : <ro — p < c < vo} ■ 

The reader may find it instructive to consider other plausible sets 12 which 
satisfy the conditions of Theorems 3 and 4 and also some which do not, assuming 
0 - = 1 for simplicity. It should be observed that no matter what 12 may be, pro¬ 
vided only that o- ^ co for all o) in 12, we shall have by (32) and (35) 

sup [r(/° I 6))] < .1700-j-o (- + —(approx.). 

ujO \W,1 ^2/ 


In a similar way it can be seen that for any 12 in which m 2 equals c and v < tro ^ 


sup [r(/c 1 o))] < .1700-O-0' 



(approx) 


Example 2. xi and ■n -2 are two soporific drugs, the random variables generated 
by them being the duration of sleep induced by a standard dose in an individual 
chosen at random. It is assumed that these two populations are normal with 
unknown means mi, m and a common variance a, also unknown. In a series 
of independent trials in which ni individuals received the first drug and Ui the 
second, the outcome was v = (aiu, xw, , xi„i , X 21 , X 22 , ■ ,X 2 „j). The 

statistician S is required to say which is the more effective drug 
Here a reasonable risk function is (6), where /(a) takes on only the values 
0, 1, corresponding to the decisions “mi < m 2 ” and “mi > m 2 ” respectively. 
The problem of choosing/(w) so as to minimize this risk was considered by Simon 
[4]. He showed that in case ni = m, f{v) is the uniformly best decision function 
in the class of symmetric decision functions. (Given rii = riz = n, a decision 
function /(a) is said to be symmetric if /(xn , X 12 , ■ • , xi„ ; X 21 , X 22 , ■ • , ^n) = 

1 - /(X 21 , X 22 , • , a: 2 „ ; Xu , X 12 , • • • , xi„). See also [3].) It is natural to confine 
oneself to the class of symmetric decision functions when the sample sizes are 
equal, but under the implicit assumption that if w = (o, b: v) is a possible param¬ 
eter point, then oi' = (6, a; <r) is also (cf. the remarks following (29)). The 
illustrations in Section 1 show that if the sample sizes are unequal or if Q is not 
symmetric in the sense just described, there may exist decision functions which 
are uniformly better than /(a): in ( 1 ) we have a “symmetric” Q but m n, , m 
(iii), ni = n, but £2 is not “symmetric.” _ 

However f{v) is an admissible minimax decision function no matter what 
the sample sizes, provided only that £2 satisfies a certain not too restrictive con¬ 
dition. We have 


(39) 


f(/» 1 0.) 


(?(- 1 d{w) 1 ) 

p 


for mi 7 ^ mi, 
for mi = m 2 . 


« For some purposes it would be more appropriate to take (1) M the ri^sk function for this 
problem, letting the decision functions /(«;) take on only the values 0 and 1 We have (essen- 
tjally) discussed this case in the previous example. 



484 


UAGtH: IIAJ BAItADUR AND HBRDEKT ROBBINS 


It is clear that if {w*} ia a hcquence of points m n Mich that 

lim dioik) - 0, then lim fif ”) m) ~ = sup [f{f° \ u)]. 

It—** * Ulfl 

Therefore, by Tlieorem 3, f{i’) ia admisKihle and inmimax if some point in the 
plane (u: Wi = nij) is an interior point of the «et fl of possible parameter points 
(in fact it ia sufficient if some plane <r ~ £rQ(>0) intersects in a set which 
has an interior point on the line mi ^ mj). Hence if nothing much is known 
about the two drugs, could regard the foregoing as a justification for aeserting 
"mi > TMi" if 7i > xi and “m, < mj” otherwise, 

We have given no criterion for the choice of a suitable decision function when 
two or more admissible minimax decision functions exist, and our diffidence in 
recommending the use of f(v) in the present case is due to the fact that under 
the condition stated above there will exist decision functions other than f{v) 
which are also admissible and minimax with respect to (6). Let us suppose that 
n is given by (33). Then fiv) ia admissible and minimax, by the preceding para¬ 
graph. However, it follows from Theorem 4 that each of 




1 if 7i > Cl, 

p otherwise. 


and /S’(tO 


0 if f j > cz, 
1 otherwise, 


is also admissible and minimax, where ci and a are arbitrary constants with 
max [o, h] - I < Cl, cz < min [o, f*] + 

There ia, however, some reason for preferring/(u) to other decision functions 
in the present case. S has been asked to give his opinion os to which is the better 
drug, and presumably no immediate consequences follow from the opinion which 
he might express, (This would not be the case if there were a sleepless individual 
on hand who had to be given a dose of one of the two drugs Cf. footnote 4.) 
Although the problem is of a scientific nature, insistence upon literal exactitude 
in the interpretation of “incorrect decision” is meaningful only insofar as it is 
compatible with the physical situation. In view of the limited determinacy of 
unknown parameters in general, and of the limitations of experiments on soporific 
drugs in particular, it may be possible and even desirable to modify (6) in such 
a way that for any fixed c the risk tends to zero with ] mi — mj j. Thus modified, 
the risk function would bo essentially similar to (1). A rather drastic way of 
introducing this modification would be to agree that the assertion of equality 
of the two means does not constitute an error in case | mi — ma 1 < «, where t is 
some positive constant, S will then take 


(40) 


if if 1 co) if ] mi - ma 1 > «, 
(0 otherwise, 


as the risk function. (Note that in using f,(f | u) rather than f (/ | w), S has in 
effect deleted the set (u; [ wii — m® | < «} from the given set f2 by defining 7 (w) = 



PHOBLEM OF GREATER MEAN 


485 


(0, 0) there, instead of only when mi = rth&s m the case of f(/ | w). Cf “zones of 
indifference,” [5, pp 27-30]). It follows from Theorem 3 that f{v) is the unique 
minimax decision function with respect to (40) and (33) if a = b and ni = n 2 
and also if at least one of these conditions does not hold but 

til — ni 
Ui + Tli 

Thus fiv) will be the unique mimmax decision function no matter what rii, 
m, a, h or I may be, provided only that « is sufficiently small. We shall leave 
other modifications of f (/1 oi) and discussion of f (/1 w) with respect to other 
types of parameter spaces (e.g. (37)) to the reader. 

We conclude this discussion with a remark on the proper choice of ni and 712 
in using f(v) when the risk function belongs to the class R defined in Section 
2, (C). (The risk functions (1) and (6) Belong to R ) Suppose that before experi¬ 
mentation starts, it is agreed that one must have rii 712 == 2/c, where k is a 
fixed integer. In that case, choosing tii = 712 = fc will be the best choice of ni, 
712 in the following sense, (a) For any fixed o), r{f \ u), which is the expected loss, 
then becomes a minimiiTri . This follows immediately from (22), since 

r(f 1 w) = 1 d{m) ]), [ d(<o) 1 = \mi- mt\ la, 

and 1 d( 6 )) | has its maximum when tii = 712 = k. (b) For any fixed w, the variance 
of the loss also becomes a minimum. In using f{v), the loss takes the values 0 
and /i((i)) only, with P(loss = fi(w) j w) = (?(— 1 d(w) |) = a say. Therefore, 
the variance of the loss is h^a{l - a). Since a < 5 , this expression increases with 
increasing a, and so has its minimum whenTii = th — k. This remark is, of course, 
without prejudice to the question of whether/(w) is admissible and minimax with 
respect to a given if for every tii and 712 with ni + n 2 = 2fc 


I > max 


1 a - 


b I e, max (j o — b ], c) -j- 



4. A remark on randomized decision functions. In the foregoing discussion 
we have confined attention to the class of non-randomized decision functions; 
the space of possible decisions being some subset of 0 < / < 1, the statistician 
constructs (in advance) a suitable decision function f(y), obtains a particular 
sample point 11 by sampling the two populations, and takes f{v) as his decision. 
It is, however, of some theoretical interest to consider more general formulations 
in which the decision arrived at by the statistician may be a random function 

of the sample point v. _ . 

A randomized decision function can be defined in several ways One definition 
is as follows. Let <#>(2 1 «) be a function defined for all v in En and all real 2 such 
that for any fixed 2 it is a measurable function of v, and such that for any fixed 
11 it is the distribution function of a random variable with values in 0 < z < 1. 
We shall denote this random variable by Z*(ii) and call it a (randomized) decision 
function. In using it, the statistician first obtains a particular point 11 by sampling 
the two populations, then performs a random experiment whose outcome Z 



486 


HAGIIH RAJ HAIIADUR AND HERBERT ROBBINS 


has the known distribution funetion P(/J < z) — <t>(,z \ v), and takes Z as his 
decision. Tlic class of all decision function.s corresponding; to all functions (t){z | v) 
will bo denoted by It i.s clear that this class includes the class of non- 

randomizcsfl decision functions. 

Thi.s definition of the structure of randomized decision functions follows the 
method described by Ilalmos and Savage in their interesting remarks ([6], pp. 
239-241) on the value of Huflicient stati.stic.s in statistical methodology. For 
any Z^(u), we have 

P(Z^iv) < z I 6>) = [ P(ZM < z I o), u) dKiv I co) 

(41) 

= / <f>(,z 1 v) dK(v I w). 

We shall now .shoiv that in all problems of the greater mean in which the 
methods of Section 2 can bo applied to non-randomized decision functions, ran¬ 
domization cannot be recommended. More precisely, the following holds. 

I'liEOREM. Lei J{v) he a non-randomized decision function which takes on only 
the values 0 ami 1 and which is the unique non-raiulomizcd decision Junction whose 
expected value E\} | «] satisfies a certain condition Q as a function of u. Then J{v) 
is the unique decision function whose expected value satisfies the condition Q; i.e. if 
Z^(v) is a decision function such that E[Zii I «] satisfies Q, then 

(42) P(Jiv) « Z^iv) I co) = 3 for all w. 

It follows in particular that Theorem 2 remains valid with the arbitrary non-random¬ 
ized f(v) replaced by an arbitrary Z^(v), and in consequence, Theorems 3 and 4 
remain valid when the class of decision functions in question is 12f^(c)}, 

Proof. Let .Z*(y) be a decision function who.se expected value .satisfies the 
condition Q. Now, by (41) and Theorem 5 of [7] wc have 

(43) E[Z^ M = f f*(fi) dK{v I w) = E\f* I 0 ,], 


where 


/*(c) - r zd.0(2 I a), 0 </^(v)<l. 

JO 


It is clear from (43) that Sf/* | w] satisfies Q and so we must have 
(45) f*{v) =• /(c) a.e. 

by hypotliesis. Since /(ii) takes on only the values 0 and 1, it follows from (44) 
and (45) that 


/ d.(l>{z\v) = 1 a.e., 



PHOBLEM OP GREATER MEAN 


487 


which implies (42). In order to verify the last part of the remark, consider any 
particular problem of the greater mean The risk function of any decision func¬ 
tion Zi.{v) is, by (15), 

r(Z,\c)^Wic,,E[Z,\c,]). 

Hence a condition on the risk function of is equivalent to a condition on 
E[Z^ 1 u] as a function of u, and the truth of the remark follows by appropriate 
definition of the condition Q in terms of the risk function, 

REFERENCES 

[1] A. Wald, “Statistical decision functions,” Annals of Math Stat , Vol 20 (1949), pp 

165-205. 

[2] J Neyman and E S Peahson, “Contributions to the theory of testing statistical hy¬ 

potheses,” Slal Rea Memoirs, Vol. I (1936), pp. 1-37. 

[3] R. R. Bahaduh, “On a problem in the theory of k populations,” Annals of Math. Stat , 

Vol. 21 (1950), pp. 362-375 

[4] H A Simon, “Symmetric teats of the hypothesis that the mean of one noimal population 

exceeds that of another,” Annals of Math Stat , Vol 14 (1943), pp. 149-154. 

[6] A. Wald, Sequential analysis, John Wiley and Co,, 1947 

[6] P. R. Halmos and L. J Savage, “Application of the Radon-Nikodym theorem to the 

theory of sufficient statistics,” Annals of Math Stat , Vol 20 (1949), pp. 225-241 

[7] H Robbins, “Mixture of distributions,” Annals o/MolA Stat., Yol 19 (1948), pp 360- 

369. 



ANALYSIS OF EXTREME VALUES 
RyW.J. Dixoni 
University oj Oregon 

1. Introduction. It in well recognized by those who collect or analyze data 
that values occur in a sample of n obscrvalions which arc so far removed from 
the remaining values that the analyst is not willing to believe that these values 
have come from the same population. Many times values occur which are ''du¬ 
bious” in the eyes of the atuilj'st and he feels that he should make a decision as 
to whether to accept or reject these value.s a.s part of hi.s siunple. On the other 
hand he may not he looking for an error, hut may wish to recognize a situation 
when an oeeasional ob.servation oceuns whieli is from a different population. 
He may wish to disiuu'cr whctlier a signilieant analyses of variance indicates an 
extreme rutluo .significantly dilTeront from the rtmiaindor. Also, of couise, the 
extreme value may diHer signidcantly without causing a significant analysis 
of variance and he may wish to discover this. It is reasonable to suppose that a 
criterion for rejecting ob.scrvatious would be useful here also. The choice of a 
suitable criterion for rejecting observations introduces a number of questions, 

1. Should any observations be removed if wo wish a representative sample in¬ 
cluding whatever contamination arises naturally? In other -words, it may be 
desirable to describe tlic population including all observations, for only in that 
way do -we describe what is actually happening. 

2. If the analyst wishes to sample the population unaffected by contamination 
he must either remove the contaminating items or employ statistical procedures 
wliioh reduce to a minimum the effect of the contamination on the estimates of 
the population. That is, lie may wish to describe only 95% of his population 
if the description is altered radically by the remaining 5% of the observations. 
He may have external reasons which are good and sufficient for wishing to de¬ 
scribe only 95% of his observations. Suppose he wishes to use the .sample for a 
statistical inference; the inclusion of all the data may sufficiently violate the 
assumptions underlying the inference to exclude the possibility of making a valid 
inference 

This paper will concern itself only with those problem.^ u'hich ari,so from Que,s- 
tion 2, 

If wo wish to follow some procedure which attempt,s to remove contamination 
ivc must consider the performance of any proposed criterion with respect to the 
proportion of contamination the criterion will discover and, of course, the propor¬ 
tion of the "good” oljsorvations which are removed by the use of the criterion. 
But, perhaps more important, we must consider what sort of bias will result 
when the standard statistical procedures are applied to samples of observations 
which have been processed in this manner. 

' This paper was prepared under a contract with the Office of Naval Research 

488 



EXTREME VALUES 


489 


If we wish to follow a procedure which will not search for particular values to 
be excluded but will minimize their effect if present, we must investigate the 
sampling distributions of these modified statistics and estimate the loss in in¬ 
formation resulting from their use when aU observations are “good.” We must 
also investigate the expected bias which will result when “bad” items aie present 
even though essentially excluded. Perhaps most disturbing about the avoidance 
of “bad” items is the fact that a decision must still be made as to whether a 
“bad” item was present or not in order to know in which way our estimates may 
be biased For example, a sample mean computed by avoiding the two end ob¬ 
servations will not be a biased estimate of the mean of a symmetric population 
if both end items should actually be included or if both end items should not be 
included. However, if only one of the two should not be included this estimate of 
the mean will be biased 

2. Models of contamination. The performance of the various criteria for dis¬ 
covery of one or more contaminators will be measured with reference to con¬ 
taminations of the following two types entering into samples of observation.s 
from a normal population with mean ij. and variance a-, N{ij, v ) 

A. One or more observations from N{n + Xv, ff), 

B One or more observations from N(ji, 

A represents the occurrence of an “error” in mean value such as will occur in 
dial readings when enors are made in reading incorrectly digits other than Ihe 
last one or two digits. Errors of this sort may result from momentary shifts in 
line voltage or from the inclusion among a group of objects of one or two items 
of completely different origin. This type of contamination will be refen ed to as 
“location error.” B represents the occurrence of an “error” from a population 
with the same mean but with a greater variance than the remainder of the sample. 
This type of error will be referred to as a “scalar error.” It is likely that many 
errors could be better described as a combination of A and B, but a study of these 
two errors separately should throw considerable light on the question of gross 
errors” or “blunders.” 

Many authors have written on the subject of the rejection of outlying observa¬ 
tions. Apparently none have been successful in obtaining a general solution to 
the problem Nor has there been success in the development of a criterion for 
discovery of outliers by means of a general statistical theory; e g., maximum 
likelihood. A large number of criteria have been advanced on more or less intui¬ 
tive grounds as appropriate criteria for this purpose. In no case was investigation 
made of the performance of these criteria except for a few illustrative examples^ 
References for the criteria discussed in the next section are given at the end 
of this paper. Indications are given as to the significance values available in 

those papers 



490 


AV, J. DIXON 


3. Criteria to be considered. Tho performance of two types of criteria has 
l)cen investigated for samples contaminated with location or scalar errors. 

a) <T known or estimated inde-pemdenlly, 

b) <r unknown. 

The n observations are ordered a-i < < - • ■ < a:„. The criteria involving 

external knowledge of a are: 

A. lest, 

j X{x - S)’ 

X = --^5—• 

B. Extreme deviation, 





C. Range, 

„ w 

Ci = ■ , w = Xn - ;i:i , 
tr 

Cj — s’ *= -- (s independently estimated). 

S T1 — 1 

The criteria involving only tlie information of a single sample of n observations 
are: 

D. Modified F test. 

1. For single outlier , 

= ^2 J where S (a: — .^l)^ Si = 2 2 :/(n - 1), 

AJ 2 2 

S’* = X) — •^)^ X - S 

I 1 

^or for Di = . 

2. For double outliers a;i, X 2 , 

A = where S ?,2 = 'E, (.x — Xi.O’, = E^/in-2) 

o' 3 3 

(^or for , r.,.!, . 

E. Ratios of ranges and subranges. 

1. For single outlier Xi , 



EXTREME VALUES 


491 


J'lO = 


X2 — Xl 
Xn — Xi 


(^or for x„, rio = — - 

\ Xn - Xi / 

2 . For single outlier Xi avoiding a:„ , 

X2 — Xl 


rn = 


Xn-l — Xl 


(or for Xn avoiding Xi, rn = ^ -^ 2 =?), 

\ Xn - X2 J 

3. For single outlier xi , avoiding a:„ , Xn-i, 

X2 — Xl 


^12 = 

Xn-2 — Xl 

^or for Xn avoiding Xi, X 2 , rn = 
4. For outlier Xi avoiding X 2 , 

Xl — Xl 


X„ — Xn- : 

X„ — Xl 


0 - 


rso = 


Xn - Xl 




(or for Xn avoiding x„_i, = - j. 

\ Xn Xl / 

5. For outlier xi avoiding xi and .t„ , 

Xl — Xl 


Til = 


^71—1 


( or for Xn avoiding a:„_i, xi , 7-21 = ^) . 

\ Xn X2 / 

6 . For outlier xi avoiding xi and x„ , Xn-i, 


Xl — Xl 
Xn —2 


^or for Xn avoiding x„-i ,xi,X 2 , 


Xn 2 

Xn - Xb , 


F. Extreme deviation and standard deviation. 

For single outlier Xn , 

„ Xn — X f c TP 

F = — - I or for xi, p = —^— j ■ 

The performance of the large number of criteria listed here will be assessed 
with respect to discovery of contamination of the type given m Section 2. 



492 


w. j. mxoN 


4. Performance of criteria (estimate of <r available). The feet will of course 
give an indication of a large dieperaion and since the extreme values are chief 
contributors to the sum of squares, it is possible to use this test as a criterion for 
rejecting a value or values which are at the greatest di,stance from the mean 
It might be supposed the Bi and Hi would give bettor results since particular 
attention is paid to the end item. The same argument would influence one in 
favor of Cl or C 2 . The performance of C* can, of course, be expected to vary with 
the degrees of freedom in the independent estimate of a-. For this study the de¬ 
grees of freedom for this estimate were hold to the single value 9 d.f. 

X^ may be used since if the value of x’’ is too large (greater than some upper per¬ 
centage point for x) wc might reject the value most distant from the mean. 
X tables may be used for porcontage points. Percentage points for the other 
statistics considered hero are given m the references at the end of this paper, 

The criteria A, Bi, , Ci, were investigated for a = 1%, 5% and 10% 

for X = 2, 3, 5, 7, where one or more items are selected from a population fV (11 + 
Xo-, 0 -’) and the remainder from Niti, o-’). Investigations were also made for one 
item from Nin, XV*) for X = 2, 4, 8, 12. The investigation was carried out by 
sampling methods The performances of different criteria were assessed for the 
same group of samples in order to obtain more precision in the comparison of the 
different teats. All of the points appearing on the graphs in the subsequent sec¬ 
tions of this paper were based on from 60 to 200 determinations, 

The performance of the above criteria is measured by computing the propor¬ 
tion of the time the contaminating distribution provides an extreme value and 
the test discovers this value. Of course, performance could be measured by the 
proportion of the time the test gives a significant value when a member of the 
contaminating population is present in the sample, even though not at an ex¬ 
treme. However, since it is assumed that discovery of an outlier will frequently 
(be followed by the rejection of an extreme we shall consider discovery a success 
only when the extreme value is from the contaminating distribution. 

The performance was judged by applying the criteria to each sample, always 
suspecting an outlier in the direction of the shifted mean for location error. 
Since the location errors were inserted by adding a fixed value to one or more 
of the observations, the largest value was tested as an outlier. The measure of 
performance was the percentage of location errors identified. When the location 
error was not an outlier, no test was performed and a failure for the test recorded. 

In the case of the model of contamination involving the scalar error, the value 
was suspected which was farthest from the mean. This of course, alters somewhat 
the level of significance, but this procedure was followed alike for all criteria 
investigated. The performance was measured in the same fashion as for location 
errors. 

Considering first, location errors, a study of the performance curves showing 
the per cent discovery of contaminators plotted against X (the number of standard 
deviation units the population of contaminators is removed from the remainder), 
shows that the level of performance for <f known is considerably above the level 



KXTRBMK VALUES 


493 


of performance when a- is noL known The difference is greater for rt = 5 than 
for n = 15 and, of course, the difference will diminish as the sample size increases. 
Figure 1 shows the performance curves for a = 5% (5% significance level for 
the test for an outlier) of — j)/(r for n = 5 and n = 15 and of rio = 

^ -for n = 5 and n = 15 

Xft Xi 

The graphs for a = 1% and 10% would be similar in appearance. Figure 2 
indicates the change in performance for a = 1%, 5 %, and 10% The cuives 
plotted are for Bi = (xn - £)/ff The curves for .4, Bi,Ci , 0^ show very similar 
results 

The curve for test Bi was used m Figures 1 and 2 since it gives the best pei- 
formance of all criteria which are considered here if a single location error is 
present The curves showing the comparative performance of these criteria as 



Fig, 1. Improvement in performance ob- Pio. 2 The effect of the level of signifi- 
tained with knowledge of a, a = 5%, 71 = 5, cance on the performance of ; « = 1%. 
15 5%, 10%: 71 = 5, 16 


well as one to be considered later (no) are given in Figure 3 for a — 5% and for 
n = 5 and n = 15. 

The following statements can be made from inspection of Figure 3: 

a) The differences among A, Bi, B^, and C^ are not great. 

b) The knowledge of cr is less important m larger samples. 

c) The curve for Cu lies above that of no for n = 5 and below that of for 
n = 15. This is consistent with the use of 9 d f in the independent estimate 
of 0 - 

If the question of ease in computation or application is important, it may be 
desirable to use B 2 or Ci in place of Bi for they are slightly easier to compute 
and it is not necessary to measure all observations to obtain the value of these 
statistics From Figure 3 it will be noted that the performances of these criteria 
are nearly as good as for By. If two outliers may be expected in a single sample. 




494 


W, J. DIXON 


% JT 



Fia 3. CoinpariHot) of the perfonuaiice of ciileria using a known (or using external 
cstimatoa of a) and no for aamploH of size fi dtul 15, a “■ r>%. 


the performance of will be lowered and the performance of Bi and Ci will be 
improved. Any dilTercncea between the performance f>f Bi and the performance 
of Cl when two outlierH are proKcnt waa not cliHCcruable [or n = 5 or 15. Figure 4 
illustrates the improvement in performance for Bi for a = 5% and n = 15. 

Tlic iiorforinanco curves of these criteria if a scalar error is present are very 
similar to those above except that: 

1. A high level of performance is approached veiy slowly. For example, see 
Figure 6 showing the performance of Bi and no for n == 5 and n = 15 and a = 5%. 

2. There is a smaller difference in the performance between the criteria with 
O' known and o- unknown (see Figure 5). 

The performance of Bi and Ci are noticeably increased by the in.roduction 
of more contarainators while that of B-i decreases. No difference in the perform- 



Fig. 4 Companaon of the performance of Bx for one and. two location errora in aamplea 
of size 15, a=J>% 


EXTREME VALUES 


495 


3iiic6 of Si Riid. 0i W6rG notod for oittior n ~ 5 or ti — 15# Figur© 6 stiows th© in¬ 
crease in performance of two contaminators for Bi for ri = 15, a = 5%, 

The general recommendations for possibilities of either type of contamina¬ 
tion, location or scalar errors, would lead one to the use of Si or Gi if c is known. 

Criterion Ci is recommended since: 

1 Its performance is almost as good as the performance of Si for a single 
outlier. Their performances are about equal for two outliers and Ci affords pro¬ 
tection for outliers either above or below the mean. 

2. It is simple to compute. 

If ease of computation is not essential and maximum performance is desired, 
the criterion Si should be used. The performance of 0% will approach that of 
Si as the number of degrees of freedom in the denominator increases. 



Fig. 5 Comparison of the performance of Fiq 6 Comparison of the perfo mance 
Bi and one scalar error for samples of Bi for one and two scalar errors in samples of 

size 6 and 15, a — 5% .size IS, a = 5% 


6. Performance of criteria (no external estimate of n). Criteria Di and Dj 
have strong intuitive reasons for their use since the dispersion is estimated by 
s^. The r ratios aie attractive because of their simplicity and their preoccupation 
with the extreme values. Test F is the "studentized” ratio corresponding to Bi , 
and is equivalent to Di since Di = 1 — F^/{n — 1). There is no apparent dif¬ 
ference in the performance of Di and rio when one outlier is present and no 
apparent difference m D 2 and rw when two outliers are present This is true for 
both models of contamination and for the three levels of significance investigated. 
However the comparison of D 2 and was made only for n = 5 since critical 
values are not available* for D 2 for n = 15. (Critical values are available for 
n < 12.) 

The performance of Di and rio under the two models of contamination can 
be obtained by reference to the curve for ru in Figure 1 and Figure 5. The curve 
for Di is practically identical with the curve for rw . 


® After this paper was submitted, the critical values of Da have been extended to n < 20 
(see references) 


V,-. J. DIXON 


There is no ciueHtioii that no is simpler to use, so that if this condition of 
contamination (scalar errors) exists, no would probably be chosen. However as 
before, we should investigate what hapixms when more than one error ia present, 
Di is designed for this case as is r^i. Since the performance of these two criteria 
is approximately the same, rjo would prolialily he chosen bec'ause of its simplicity. 
Critical values for this statistic are available for n < 30. 

ni, rij, rso, rn , rn were designed for u.sc. in situation,s wbere additional out¬ 
liers may occur and wo wish to minimize the elTccL of these outliers on the in¬ 
vestigation of the particular value being tested. 

It has been suggested that Di could be used repeatedly to remove more than 
one outlier from a sample. This procedure cannot lie recommended since the 
presence of additional outlicra handicaps the jierformance of both Di and rio 
for small sample sizes and therefore the process of rejection might never get 
started. For larger sample sizes the performance of Di is affected much less by 
the presence of two errors than is the performance of rjo. 'I'lie repetitive use of 
Dj is not recommended in tiiis case either since r^o performs in a superior man¬ 
ner to Di in such situations. This difference in performance of Hi and rm de¬ 
pends markedly on the level of significance used a.s well as the sample size. 
B'or small samples there is little difference in performuneo for any of the levels 
of signilicanco one might use. B’or the larger sample sizes there is no appreomble 
difference for very high levels of significance. 'Flie differenec is however very 
great for lowor levels of significance. In fact as X increases for two errors of the 
location t 5 TJC, tho level of significance which divides the region of approach to 
zero performance from the region of approach to perfect performance of Hi is 

given by the levelof significance correspondingtoasignificance value of 

for Hi. Thus, for example, in samples of size 15, - ~ - ~ ^ ~ 

This value lies between the values for the 2.5% and 5% level of significance. 
These values are .503 and .566 respectively. Therefore the use of the 1% or 
2.6% levels will give poorer and poorer performance as X increases, and the 
use of the 5% or 10% levels will give better and better performance as X increases 
when two errors are present. The dividing point is such that for samples of 
size 11 or leas the use of any of the given levels of significance will cause the 
performance to decrease as X increases. For samples of size n < 14 the 1%, 
2.6% and 6% levels have the same effect, and for samples of size n < 16 the 1% 
and 2.6%, for samples of size n < 19 just the 1% level. For three such errors 

the limit approached by Hi as X increases is ^ Therefore, the perform¬ 
ance of Hi will approach zero for all levels of significance and for all sample 
sizes for which critical values are known except the 10% level of significance 

for sample sizes larger than 21. An indication of these limiting values —^— n-T. 
for k contaminations present can be obtained by considering these k values to 



EXTREME VALUES 


497 



Fig. 7. Comparison of the performance of Fio. 8 Comparison of the performance of 
i,he r criteria for one location error in the r criteria for one scalar error in samples 
samples of size 5, a = S% of size 5, a = 5% 


be at a distance k from the population mean, computing Di and allowing X to 
increase indefinitely. 

The comparative performance of the r criteria, a = 5%, m samples of size 5 
for the two models of contamination (one contaminator present) are given in 
Figures 7 and 8. For samples of size 15 the curves are given in Figures 9 and 10. 
A single curve suffices here since there is no discernable difference in the curves 
for the different r criteiia. There is considerable difference in the performance 
curves if more than one outlier is present. However, the performances of no, 
fii) ri 2 ^re essentially the same when two location outliers are present as are 
the performances of rjo, roi, roa. Figures 11 and 12 show the comparative per¬ 
formance of no, m , ni for one and two contaminators for a = 5% and n = 5. 
Figures 13 and 14 are for n = 15. Figures 16 and 16 show the comparative per- 



5 % 




Fig. H. Coraparifloii of the pcrfommnct* Fia. 12. Conipanaon of the performance 
of the n. criteria for one and two loc.atiou of Iho n. criteria for one and two scalar 
errors in Samples of size 5, a errors in siunples of size 5, « •= 5%. 

lonnancc for tm , , {rn ia not. a test for n = 5) for one and two contaminators 

for « = 5% and n = 5. Figures 17 and 18 are for , m for n = 16. The 
six ourves represented by the single curve of Figure 17 lie within 5% of the 
curve shown. The same is true of the three emwes represented by each of the 
two curves of Figure 18. 

Since no loas in performance results for larger samples from the use of rjo, 
hi , Tn in place of , ru, Tn , and further, these cribona are not appreciably 
affected by the presence of another outlier it would seem unwise to recommend 
the use of ric , ru , ru . However, note that for small samples (see Figures 11 and 
12) the performances of ru and ru and ns are considerably better when a single 



Fig. 13 Comparison of the performance Fig. 14. Comparison of the performance 
of the n. criteria for one and two location of the r,. criteria for one and two scalar 
errors in samples of size 16, « = 6%. errors in samples of size 15, a => 5%. 





EXTREME VALUES 


499 



Fig is Comparison of the peifoimance Fig 16 Compaiison of the peifoimance 
of the T 2 criteria for one and two location of the rj criteria for one and two scalar er- 
ei’iors in samples of size 5, a = 5% rors in samples of size 5, a = 6% 


outlier IS present Therefore in larger (n > 10) samples j-jo 01 r.! would appear 
to be the best criteria In samples of size 10 or less, rio or rjo should be used; 
rai if the extreme value at the opposite end should be avoided. 

It should be noted in the comparisons that no model of contamination was 
investigated which would cause one or more errois at both extremes in the 
sample It is obvious that the performance of Di and Dj would be considerably 
decreased while the performance of rn , rn , and r^i, raa would not be materially 
affected since these criteria avoid values at the opposite extreme Then repealed 
use might discover most of such outliers, while Di or D 2 might fail on the first 
trial. 



Fig 17. Comparison of the performance Fio. 18. Comparison of the peiformance 
of the rj criteria for one and two location er- of the ra. oiitena for one and two scalar er¬ 
rors m samples of size 15, a = 5% rors in samples of size 15, a = 5% 



W. J. DIXON 


500 



J'la. 19 Performance of /?i for various levels of sigiiificance when the population is 10% 
contaminated with location errors 


6. Sampling from a contaminated population. In the previous sections the 
performance of the various criteria were aascaserl for samples whore a certain 
number of contaminators wore present. One might well ask why a tc.st is needed 
is it is known that contaminators arc present. It would seem more realistic to 
state that a certain per cent of contamination will occur in the long run and 
that one will not know in any particular ease whethei’ 0, 1, 2, ■ • • contaminators 
will be present. One would then wish a criterion to indicate the presence of 
contamination in a particular sample, 

The performances of these criteria will be investigated for the same two 
models of contamination and their performances will be reported as per cent of 



n = 5 n = 16 


Fig. 20. Performance of Bi for various levels of significance when the population la 10% 
contaminated with scalar errors 


extreme values 


501 



Fig. 21 Performance of Bi for various levels of contamination for location errors and 
using the 6% level of significance 

total contamination discovered. The tests will be applied only once to each 
sample. Repeated use of the criterion would in many cases increase the per cent 
of total contamination discovered It is not known what effect such a procedure 
would have on the level of significance. 

Investigation has been made for 6, 10, and 20% contamination. For example, 
in samples of size 5 which have 10% contamination, on the average, 59.0% of 
the samples will contain no “errors”, 32 8% will contain one, 7.3% two, 0.8% 
three, 0.1% four, and 0.0% five. Thus in 100 samples of 5 which are 10% con¬ 
taminated with location errors having mean n -|- So-, about 59 contain no errors. 
If the rio criteria is used with a 5% level of significance one value will be “dis- 



Fig. 22 Performance of for various levels of contamination for scalar errors and 
using the 5% level of significance. 



W. J. DlXONt 


TT 


' ‘ j i 

1 

* 

A* j J 




ua 

O / £ 


3 3" 6 7 a 


(Location) (Scnlai) 

Fio a3. Performance of r.., Ih , r» . />= >n ;> “"‘''f 

canoe and aarapling from a poinilation \rhic)i ih 10% oontamiimled. 

covered" in 3.0 of ttio samples eoalahiing no errors. Of the 33 sampto conbammg 
one "error” the “error" ivoultl by discovered in 18 of these samples, Ihis criteria 
would discover none of the “errors" in samples contammg more than one oij- 
ror". We would have obtained 18 of the 60 contaminating values and 3 which 

were members of the original population. lof^ina+nva 

Wlren v is known the performance will increase when more 

are present. Performance however has been 

single contaminator; i.e., the test has been used ony on’e. ^pnrpasp with 
increasing percent contamination the level of performance will 
increasing contamination. Repeated use of the test criteria has not been 

vestigated. 


Y 



PiQ. 24 Performance of rio(Di) and rjj(A , rso, ra) for various levels of sigmtica 
when the population is 10% contaminated with location eriors 


EXTREME VALUES 


503 



■PiG. 25 Performance of rio(A) and raCDi , rw , rai) for various levels of significanc® 
when the population is 10% contaminated with scalar errors. 


Criteria Bi gives the best performance for both location and scalar errors for 
the levels of contamination and levels of significance considered. A and (7i are 
only slightly inferior. Bi is handicapped when more than one error is present 
thus its performance is poorer for heavier contamination Figure 19 shows the 
performance of Bi for the different levels of significance, 10% contamination, 
and the two sample sizes 5 and 15 for location errors Figure 20 shows the results 
for scalar errors. Figures 21 and 22 show the performance of Bi for the 5% 
level of significance for the different levels of contamination 
When a is not known the performance of various criteria will eventually 
decrease as more and more contammators are present in the sample even though 



Fig 26. Performance of riofDi) and , tk , ru) for various levels of contammation 

for lotafion errors and using the 5% level of significance 


501 


W. J. DIXON 



Fia. 27. Performanoo of rwCA) oiid r,t(D, , rjo , m) for various levcils of oontaminatioa 
for scalar errors and the 6% lovel of significance, a 5%. 


several of the criteria slvow improvcmcat in discovering a single error if two 
are present. Tho performance of those criteria is greatly affected by the size 
of the sample. For samples of size 5, rw and Di perform alike, rm being superior 
to tho other r’s (rjo second best) for the levels of contamination considered, 
and Di is inferior to ran, Figure 23 compares tho perfonnanco of no, i>i, rw , 
and Di for the 5% level of significance and 10% contamination. The results 
for other levels of significance and contamination are comparable. 

For samples of size IS, rjo , rn and r-a perform alike as do no, ni and rw . Di 
and rjo, rn, ra perform approximately the same and are superior to no, ni. 



Fiq 28, A comparison of the performance of ru and A for two scalar oontaminators 
when tests are made at one extreme only, a =• 6%, n = 15. 




EXTREME VALUES 


505 


and Vxi . Critical values are not available for D 2 for 71 > 12 , The performances 
of Di, r 2 o, r^i and r 22 are indicated by a single line in Figures 24, 25, 26, and 27 
which show the effect of level of significance and level of contamination of the 
performance of Di, no , m and rj, for samples of size 15 and for no (Di) for 
samples of size 5. 

7. Remarks and conclusions. Throughout the investigation of performance, 
location errors were placed only at one extreme and scalar errors at either ex¬ 
treme. The test for an error was made using as a suspected value the extreme 
value in the direction of the location error or in the case of the scalar error the 
value most distant from the mean. It can be expected then that if performance 
were assessed when location errors could occur in either direction, different 
results would be obtained Also in the case of scalar errors if errors were always 
sought at one particular extreme or at both extremes diffeient results would be 
obtained. If these changes were made in the models of contamination, those 
criteria designed to avoid errors at the other extreme would have an advantage 
over those which ivere not so designed for o- unknown. If o- is known the criteria 
which do not avoid the other extreme would have an advantage over those 
which do avoid the other extreme. These points just mentioned will be used to 
discriminate between those criteria which were judged to be equal in perform¬ 
ance under the models used in the sampling study. For example, Figure 28 
compares the performance of and Di for two scalar contaminators when 
tests are made only at one extreme, a = 5%, n = 15. 

1 . For <r known: 

Bi or Cl should be used, or m small samples A, Bi or Ci should be used 

2 . For <T unlcnown: 

rio should be used for very small samples, ria should be used for sample sizes 
over 15. Probably ni would be best for sample sizes from about 8 to 13 If sim¬ 
plicity in computation is not important and "errors” are not expected at both 
extremes Di would do equally well. When critical values are available for larger 
n, D 2 should prove useful in the larger sample sizes 

LITERATURE RBPBHRINQ TO CRITERIA LISTED IN SECTION 3 

(Bi) A T McKay, “The distribution of the difference between the extreme obseivation and 
the sample mean in samples of n from a normal universe,” Biometrika, Vol 27 
(1935), pp. 406-471. Procedures for obtaining percentage values given. 

(Bz) J. 0 Irwin, “On a criterion for the rejection, of outljnng observations,” Biometrika, 
Vol 17 (1925), pp. 238-250 Pr(,Bt > X),X = .1(,1)6 0, n = 2,3,10(10)100(100)1,000, 
Tables concerning the second and third ordered observations are also given, 

(Cl) E S Pearson AND H O Hartley, "The probability integral of the range in samples 
of n observations from the normal population,*’ BiomBlTika, Vol 32 (1942), pp 
301-310 0 1%, 0.5%, 1 0%, 2.5%, 5%, 10%, n = 2(1)12, values to 20 available by 
interpolation. 

(Ci) D Newman, “The distribution of ranges in samples from a normal population, ex¬ 
pressed in terms of an independent estimate of the standard deviation, ’ Bio metnka, 
Vol 31 (1940), pp 20-30. 1% and 5%points for Cj; for w, n = 2(1)12, 20; s, d,f = 
5(1)20,24, 30 , 40, 60, 



500 


\V. J. DIXON 


(t'j) E. S. PBArisON AMI H. 0. IlAim.Ky, "XablcH of tho probaljilily mtcBral of the student- 
iaed range/’ BiomrCnka, Vol. 33 (UM2), pji. 80-1)9. Upper and loher 5% and 1^ 
pohitfl for Ci , for m, n 2<1)20; for*, d.f. = 10(1)20,24,30, 40, GO, 120, «, ° 

(Ct, Bi) K 11 Nadi, "The diBlrihiitioii of the extreme deviate fiom the sample mean and 
its studentked forms," Bioiiietnkn, Voi. ^5 (lO-lK), pp. 118-144. B, uppei and lower 
■1%, .5%, 1%, 2 .»■)%, 5%, 10% iHiinta for n « 3(1)9. 

(Bi , D], F, Bi) F. E. GrnsMBH, "Sample criterion for testing outlying ohservationa ’’ 
Annals of Math. Sm., Vol. 21 (1950), pp. 27-53. F, B, ; 1%, 2 5%. 5%, 10%, n ^ 25- 
Br. 1%, 2.5%, 5%, 10%, n < 20; B,: 1%, 2.5%. 5%, 10%, ‘25. 

(F) W. R. Thompson, "On a oritorion for the rejection of observations and tho distribution 
of the ratio of dovialion to sample atandard deviation,’’ Annals of Math. Stal. 
Vol.0 (1936), pp 214-219. 20%, 10%, 5%, n -= 3(1)22(10)42,102, 202, 602,1002, 

(F) E. S. Pbaiison and Chandka .Sekau give a further discussion of F in "Tho efficiency of 
statistical tools and a criterion for the rejection of outlying observations,” Bio- 
melrika, Vol. 28 (1930), pp. 308-,320.10%, .5%, 2.5%, 1%, n «= 3(1)19. 

(r’s) W. J. Dixon, “Ratios involving extreme values," Annals of Math, Stal , to be pub¬ 
lished. ri,, m , n, . ra , r^, , r„ ; 5%. 1%, 2^), 5%. 10%, 20%, 30%, 40%. 60%, 
60%, 70%, 80%, 90%, 95%, n < 30. 



DISTRIBUTIONS RELATED TO COMPARISON OF TWO 
MEANS AND TWO REGRESSION COEFFICIENTS 

By Uttam Chand^ 

University of North Carolina 

Summary. We consider here the relative merits of different statistics avail¬ 
able for testing two means or two regression coefficients in relation to one-sided 
(asymmetric) and two-sided (symmetric) alternatives in case of unequal popula¬ 
tion variances In so far as the Behrens-Fishcr statistic is concerned we confine 
ourselves to the consideration of the behavior of it? probability of Type I error 
m repeated sampling from populations with a fixed value of the unknown ratio 
of variances. In connection with the tests between two means, the present 
study takes its point of departure from the existing tests and investigates the 
question of utilizing an approximately determinate knowledge about the un¬ 
known ratio of variances In connection with the comparison of two regression 
coefficients and also of two linear regression functions, we consider the effect of 
two concomitant sources of variation, viz, the unknown ratio of residual variances 
and the ratio of the sums of squares of the fixed variates, on the probability of 
Type I and Type II errors of certain well known statistics. 


1. Introduction. Consider two independent samples a;i • • Xm+i and xi Xnj+i 
drawn from two normal populations with means mi and m 2 , variances and al, 
Let K = aXjsX . If if is known and mi = m 2 , the quantity 


ni + Tiz \ni -f 1 K{n2 -f l)/_ 


(ii is Fisher's t) is distributed according to "Student’s” distribution with m + n 2 
d.o.f.’* and for the "Student’s” hypothesis Fo:mi = m 2 provides a uniformly most 
powerful test against an asymmetric alternative fii'.mi > (or <)m 2 and a 
type Bi test against a symmetric alternative ff 2 :mi 5 ^ m 2 . If if is unknown 
certain approximate and exact tests have been suggested from time to time to 
meet this situation. 

Welch [1], [2] using an approximation to the distribution of h was the first 
to point out that if if is unknown and we assume it to be equal to unity, then 
the probability of Type I error of the <i-test is subject to large variations as K 
varies from 0 to 00 . He also pointed out that the statistic 


V 


\' s{x-$y s'jx' - 

^ * .ni(ni -h 1 ) 712(112 + 1 ) _ 


1 Now Aesistant Professor of Mathematical Statistics at Boston Uiiivei sity. 
® Degrees of freedom. 


507 



508 


t'OTAM OHANC 


which does not have "Student’s” dihtrii)ution for K ~ 1, has the advantage 
that its probability of Type I error is subject to less variation with respect to K. 
Hia approximate rcMnilta were, later confirraerl by Hsu [8] who obtained the 
distribution of quantities and Ui( = u“) and also showed that these tests 

are unbiased in the sense of Neyman and Pcaraon. Hsu concluded on the basis 
of his investigations that when the sample sires are equal and not very small, 
wo may safely use Ui(=ua) as if K wore unity. This also had been pointed out 
by Welch. 

If on the baeis of past experience some approximate value k of K were available, 
one would like to know if such a choice in some rough neighborhood of K would 
in anyway improve the claim of for K « k) for the hypothesis mi = mj. 

The distribution of this generic quantity for A = 1: =y for A; = + 1) 

\ naCnj + 1) 

will be obtained in Section 2.1. It will be shown tliat variation in the probability 
of Type I error of Ik with respect to K for any k except when h = v, is essentially 
similar in character to that of it [3] and is very sensitive in a neighborhood 
of K in which one would very often be interested (Section 2.1). This is also true 
of the behavior of the power function of 4 with respect to K. Consequently a 4 
type of statistic will bo unsuitable in general for utilizing an approximately 
determinate knowledge of K. 

It is not possible to infer directly from Hsu’s work on the relative merits of k 
and V in relation to asymmetric asimcts of "Student’s” hypothesis. His basic 
conclusions os regards unbiasedness and the nature of variations in Type I 
error in the symmetric cose also hold for the asymmetric, case except that the 
Type I variations in 4 and c are less for asymmetric than for symmetric com¬ 
parisons (Section 2.5 and Table II). Furthermore it appears (Section 2.6 and 
Table III) that with respect to the variations of K both the asymmetric and 
symmetric power functions df 4 are likely to be more sensitive than those of v. 
Since for equal d.o.f. both the asymmetric probability of Type I error and 
power function are insensitive to the vagaries of the ‘nuisance’ parameter K, 
there is an a fortiori reason for using t>(=4) as if if were unity. 

Scheffd [4] considered the statistic 


)S = (S 


/nl+] 

« - S') g 


(U; — * 


(ni < rii), 

•-(Hi)'*' 

\n} + 1/ 


ni(nj + 1) 

(equivalent to paired difference i when rai = th) where W; = a; 

and where it is assumed that the variates in each sample have been randomized. 
This is essentially a “Student’s” t comparison based on rii d.o.f. and as shown by 
Scheff6 it is impossible to get a suitable statistic with the i-distribution with 
more than U] d.6.f. The statistic w has the 4distribution only when K = (ui 

d.o,f.), K = 0(712 d.o.f.) and K = (^^ ^ d.o.f.). For any given 

712(^2 "T 1 / 

ni jUi, K and P we can solve P = P(a > 4 ] Ho) for 4 and thus indirectly obtain 



COMPAKISOrT 01' TWO MEAJ^S 


509 


from the tabulated values of the i-distribution the number of ‘effective’ d.o.f. 
which will thus adjust v to any preassigned level of significance. We try to 
show in Section 2.6 that in situations where some approximate knowledge of K 
is available, the statistic v seems to have a decided advantage over any other 
statistic having the f-distribution. We show by actual computations that Welch’s 
formula [2] provides a conservative estimate for the effective d o.f. in the light of 
which this comparison will be considered. 

The Behrens-Fisher fiducial test employing the statistic d [5], [6], which has 
essentially the same structural form as v, has given rise to much controversy 
essentially because of inconsistencies arising from tests of significance based 
on the fiducial distribution of unlcnown parameters. We attempt to show in 
Section 2 7 that the fiducial test in general is ‘conservative’ in detecting significant 
results in repeated sampling from populations with a fixed value of the unknown 
ratio of variances. 

In the case of comparison of two regression coefficients when the residual 
variances are unequal, we are faced with a similar type of problem. Consider 
two samples j z,, and j/„ | a:, (p = I, • • • , ni + 1, v = 1, • , rii + 1), where 

Xu and are fixed and y^. and y[ are normally and independently distributed 
according to N{ai + ^i{x^ — x), <n) and N{ai + ^%{x'y — x'), al) respectively. 
For the hypothesis ft = ft when the alternatives do not specify an 3 dhing except 
ffi > ^2 dr <102 , or /3i 7^ ft we shall consider the merits of statistics t* and v* 
which correspond to statistics <i and v for the two means. While the statistic t* 
is sensitive to the variation of both K = a\/<A. and v), the ratio of the sums of 
squares of the fixed variates, the statistic Ji* is insensitive to the variation of 
both. Barankin® has extended Scheffd’s test to the comparison of two regression 
coefficients under the above assumptions. The statistic proposed by Baranlcm 
has Student’s distribution with ni — 1 d o f. (rii < 712 ) and provides the only 
exact unbiased test so far known While Scheffd’s test for the comparison of 
two means and Barankin’s test for the comparison of two regression coefficients 
should not be used when K is known and were never intended to utilize any 
available approximate information about K, the question of investigating into 
the possibility of using v* in the latter situation is not without interest (Section 3). 
In Section 4 we consider the hypothesis of equality of two linear regression 
functions viz., Fo: ai = aj, fii = 02 when the alternatives do not specify anything 
except ai as or 0i 7 ^ 02 ■ 

In studying the behavior of the power function and the probability of Type I 
error of certain statistics under discussion we have made full use of Hsu’s method 
and consequently only essential details have been given here 

2. Hypothesis of equality of two means when variances are unequal 

2.1. The distribution of tk for any values of n\ and n 2 . Consider the test function 
ft( = iK tor K = k] Section 1) where k is some inexact value of K This can be 

“ E W. Barankm, "Extension of the Romanoveky-Bartlett-Scheffe test" Proc. Berkeley 
SytrLposium onM^ath Stat. and Prob , University of California Press, 1949, pp 433-449. 



510 


tirrAM CHAND 


put in the form of <*=(£ + S) {bxl + cxl) * where f in N{0, 1) and the x^'s 
have independent x^-distribution with rii and na d.of., and where 

5 = (mi - mi) ( —, 

\ni +1 na 4- 1/ 

b = (K/k) («! + na)“’[fc(na + 1) + n, + 1] [K(7ii + 1) + ni + l]-\ 
c = (rii + Hi) '[fc(n 2 + 1) + ni + 1] [-KCrij + 1) + + 1] ^ 

b/c = K/k. 


In wliat follows we shall omit the subscript k from 4 . The joint probability 
element of f, xi and x% is given by 

dFih,xlxl) = K27r)-‘[r(ni/2)r(na/2)]-’c-^«l+’'!+*?>(xiV2)"''^* 

ixl/2)'"''-^ d^dixl) dixl). 


We transform to new variables t, r and 6 by the relations 

J + 5 = f(bxi + cx?)', 

hxl = r cos^ 6 (.0 < d < 7r/2), 

CX2 =“ r*sin* 0 (—“i^r^+oo), 


and integrate out r. To integrate out 6 we put z - sin* e if 5 < c and z = cos* 6 
if 6 > c. This reduces the integration w.r.t. 6 to a scries of hypergeometric 
integrals. Wc finally have the following form for the frequency function of 4 : 


git) 

( 2 . 1 . 1 ) 



i5ty(2hy'^r(^^±-^!^tL^± 


0 


/i'(i + bn 


2 i '^1 + Tij + r + 1 


p /«1 + 712 -f r + 1 712 Til + 7X2 1 — b/c\ 
\ 2 ’ 2 ’ 2 ’ 1 + 6iV’ 


where F denotes the hypergeometric function. As a check if we put h = c = 
(tii 4- ni)~^, we get the frequency function of non-central t for ni 4- rii d.o.f. For 
the case b > c we have only to interchange b with c and tii with rii. 

The null distribution of 4(5 = 0) is an even function of 4 , consequently the 
forms of the single and two-equal-tailed probability of Type I error will be the 

same except for the constant If we let i3i(5, K, k, Ui ,ni) = / g{t) dt denote the 

“ io 

single upper tail power function of 4 , from (2.1.1) we obtain 


Pxid, K, k, Til, nj) = he~‘"'\K/k)"^'^ E E 

;i»0 r-O 






XiilL 


*0 ^ 


71l 4- 712 , , 7- -1- 

“T a, —K 




( 2 . 1 . 2 ) 



COMPABISON OF TWO MEAUS 


511 


where To {1 + hi) ^ and q) is the incomplete beta ratio. To obtain the 
two equal tailed power function K, k, ni, m) we need only change r into 2r 
and omit the factor ^. 

2 . 2 . Disinhuhon of thfor even values of nt and . (For notation refer to Section 
2.1). When ni arid 712 are even, the method of characteristic functions yields a 
single infinite scries for the distribution of 4 , and when 5 = 0 this series reduces 

to 2 “ terms, Ihe characteristic function of ff. = byi -b cx 2 is given by 

<#>(0 = (1 — 2bir) (1 — 2 c7t) To obtain the form of the frequency func¬ 
tion of X we make use of the inversion theorem and integrate round a standard 
contour in the lower half of the complex plane The distribution of /*, can then be 
obtained from the joint probability element of i and X. We obtain the following 
form for the single tailed power function of 4 : 


(2 2 . 1 ) 



{K > h) 


where xo has been defined in the previous section and x'o = (1 -b 

2.3. Unbiasedness of a test based on tk . Since the single and two tailed forms 
of the power function of 4 (Section 2.1) are essentially the same functions of the 


standardised 'distance' 5, following Hsu [3] we can show that > 0 and > 0 

00 00 


for any fixed K and /c; and consequently such a generic type of statistic provides 
an unbiased test both against symmetric and asymmetric alternatives. 

2.4. Variations in the power function and the probability of Type I error of tk. 
For the case k = 1, tlsu [3] has already shown'that the probability of Type I 
error of the statistic il is subject to large variations w.r.t. K. He also pointed 
out that the behavior of the derivative of its power function w r.t. K for fixed 5 
was similar to that of its probability of Type I error w.r.t. K. We shall presently 
see that 4 also shares this property with t\ . 

In the first place one would like to know if any choice of lb in a small neighbor¬ 
hood of K would stabilize the variations in the Type I error of tk to such an 
extent as to make it approximately insensitive to that difference between k and 




512 


XHTAM CIIAND 


K. ■With this end in view we shall examine the nature of variations in the proba¬ 
bility of Type I error of w.r.t K for any fixed h. 

From (2.1.2) by putting 5 = 0 we obtain 


(2.4.1) 


F - p((. > w - mm"” S r + a) (1 - K/kf 

.(r (l) r(i + l))-‘ J„ + A, j). 

We now differentiate (2.4.1) and after simplification obtain 

g < CiiK/krWm -f 1) - ni(ni -h l)/k][KirH + 1) + ni + 1]"' (K < k). 

Similarly 
dP 


^ > Cilthini -t- 1) — 7ii(ni -f- l)/k][K{p^ -f 1 ) ni -)- 1] ^ 


{K > k), 


where Ci and Oj are certain positive constants independent of K and k. 


If k = 


rti(ni -h l) 
nj(nj 1) 


we have 


dK 


f 0 


for K ^ k. 

This is the case when tn is identical with the statistic y defined in Section 1 
and the probability of Type I error curve expressing P as a function of K has a 
minimum at this point: for < riz the minimum occurs for a value of X < 1 
and vice versa. And since v is known to be insensitive to the variation of K [3], 
therefore is insensitive to the variation of K for this value of k. 

For any other assumed value of k the curve either starts decreasing from 
if = flo or from if = 0 to the point where K — k depending upon the values of 
n\ and . In each case the ordinate of the curve continues to decrease for some 
distance; it may decrease to a minimum and then start increasing or else decrease 
indefinitely. For fixed h the power function of 4 also has a minimum when 

K = k = ^ ; and for any other k the behavior of its power function is 

similar to that of its probability of Type I error. For the case Ic = 1 numerical 
values of the single and two-thiled values of the probability of Type I error 
and power function for different values of ni and n 2 and K are given in Tables II 
and III (Section 2.5). 

In certain practical situations it may happen for example that on the basis 
of past experience one can determine k so that i ^ | fc — K | < 2. The question 
arises: how much is 4 sensitive to such a neighborhood for any fc, K, ni and 7i2 ? 
That it is hard to provide a practically useful answer to this question will be 



COMPARISON OF TWO MEANS 


513 


apparent from the nature of the distribution of 4 , which depends both on 
K and k and not merely on their ratio The following Table I will indicate how 
in such a small neighborhood P(4 > 4) can be in serious error in two different 
directions. 

2.5. Statistics 4 and v in relation to asymmetric and symmetric aspects of 
Student s hypothesis. Statistics 4 and v are special cases of 4 and the behavior 
of their probability of Type I error and power function has aheady been discussed 
(Sections 2.3 and 2.4). In this section we compare the single-tailed and two 
tailed values of the probability of Type I error and power function in the light 
of several particular examples. In all these calculations e.g. in P(i > 4) and 

TABLE 1 


Variations in P(tt, > to) with respect to k for fixed K 
(K = 6; m = 2, 712 = 4, /o = 2 447) 


k « 

1 

2 

3 

4 

5 

6 

7 


,1129 

,0936 

.0749 

0607 

.05 

.0418 

.0366 


TABLE II 

Varialiona in the symmetric and asymmetric probability of Type I error of v and 4 in 
relation to the unknown ratio of variances K 


K 

0 

.125 

5 

1 

2 

4 

8 

m 

ee 

% pOlDt of 
tabulated 

m - n, “ 3 

,074 

0033 

.0604 

.05 

0504 

0568 

0633 

0691 

.074 

sisglo tailed 5% 

t 

' 002 

0881 

0525 

.05 

0525 

.0507 

0881 

.0770 

m 

Cwo-tailed 6% 

<< 

.034 

.0181 

0110 

01 

0110 

013$ 

.0181 

0227 

034 

two'tailed 1% 

rtj « 4, nj ■■ 18 

.0112 

01201 

.0142 

0106 

0227 

0266 

0203 

0305 

0324 

Bingle tailed 1% 

D " 

012 

.omit 

0107 

.0238 

.0294 

0360 

.0407 

0433 

.0406 

two-tailed 1% 

711 ** 8, ni **• 4 

,076 

0087 

0598 

.0543 

0541 

.0614 

0521 

0531 

066 

single tailed h% 

ni => 4, 111 SB i(j 

000 U 

00043 

00310 

01 

0221 

.0483 

0793 

0864 

133 

single tailed 1% 

i 

00007 

.00031 

,00244 

01 

.0310 

0602 

.1160 

1544 

222 

two-tailcd 1% 

ni “ 8, nj “ 4 

,1342 

.1056 

0710 

06 

0368 

.0287 

.0246 

,0224 

.0204 

Single tailed 5% 


t n - 01 when K = ,074 
t P w 06 ■when = 3 0 


Pi\i\ ^ i'e), 4 refers to the single and 4 to the two tailed values of Fisher’s t 
for the appropriate number of d.o.f. Tables II and III give the approximate 
values for the probability of Type I error and the power function respectively 
both against symmetric and asymmetric alternatives. 

For equal sample sizes {v — 4) fhe Type I error and power function curves, 
representing probability of Type I error and power function as a function of K, 
have a minimum when K is unity and a maximum occurs when K is either zero or 
infinity. Maximum values of the probability of Type I error for several equal 
sample sizes are given in Table IV. It appears that for equal sample sizes the 
probability of Type I error and the power function are likely to be insensitive 
to the variation of K. We also notice in this connection that while the single 



514 


TITTAM CIIAKD 


tailed values of the piT)bahi]ity of Typo I error are, 1(’h,s than those of the two 
tailed values, the values of the two tailed power function for 5 = 1 are less 
than the corresponding single tailed values. This appears to be true also for the 
statistic V when 7ii nj. For unequal sample sizes also the probability of Type I 
error and the power function of k are likely to be more sensitive to the variation 
of K than those of v. It may be pointed out in the sequel that while it is recognized 
that for unequal d.o.f. a fair eomparisou of the probability of Typo I error and 
the power function of v with those of t\ ought to adjust v and b to the same level 
of significance, namely the same maximum (for all K) probability of Type I 
error, this would not alter our conclusions about the sensitive nature of b. 

TABLE IIP 

Variations in the osj/mmetric and synimetnc power funrtion of ti and v conesponding to the 



6% jioinf of labiUated ii(6 =» 1) 



K - 

(I S 1 

2 

*0 


Tii = nj = 3 

189 .111 .137 

141 

.189 

symmetric 

u = il 

269 .221) 2255 

.229 

.269 

asymmetric 

Hi = 8, ru = 4 

351 .202 .152 

.112 

.OR'l 

symmetric 

f. 

428 .204 .2*12‘ 

.194 

.122 

asymmetric 

nj = 8, uj = 4. 

208 .100 .162 

.156t 

.168 

symmotno 

V 

.286 250 . 2-17 

.2441 

.255 

asymmetric 

t minimum of .152 is leiiohcd for K =■ 3.6. 




t minimum of ,242 is reached for K »» 3.6. 





TABLE IV 




Maximum probabililti of Type I error of i)(= 

l\) for equal degrees of freedom 


Symintttlc 



Aiymmetric 

fii + 1 " ffa + 1 

5% 1% 


5% 

1 % 

7 

,0721 .0224 


m 

,0182 

9 




.0162 

11 



.0676 

.0160 

15 

.0598 .0152 


.0565 

.0136 

21 

.0669 .0137 


.06.38 

.0125 


2,6. StaUsHc V, SGhcff6’s test and -paired difference i. If K is known, a or SeheffiS’s 
statistic 8 should not bo used. If K is unknown, 8 is an ingenious device for 
getting a Student’s < with min(ni, ria) d,o.f. and provides the only exact un¬ 
biased test so far known. In such a situation since nothing is known about K, a 
fair comparison of the power function of iS with v ought to adjust v to the same 
maximum probability of Type I error for all K (maximum will occur for /f = 0 
or K = ro according as ni ^ 712 ); and at such a maximum significance level it is 

* The author aoknowledgos with pleasure the help given in the prepaiation of this table 
by Miss Elizabeth Shuhany of the Statistical Laboratory, Boston University 
‘ Values taken from [7] 











COMPAHISON OF TWO MEANS 


515 


recognized ihat v cannot be uniformly better than S For samples of equal 
size n the use of the paired difference t with n — 1 d.oi (equivalent to S when 
m = rii Section 1) provides a suitable test for two reasons: (i) it is exact and 
(ii) as shown by Walsh [8] has a high povver efficiency. 

If any approximate a priori information about K is available, v appears to 
be the only suitable statistic to utilize such information While *S was not intended 
to cope with such a situation, 4 (Section 2 4) has been shown to be unsuitable. 
Since v is insensitive to the variation of 7f, we shall not be far wrong in using 
‘effective’ d.o.f, based upon an assumed value k of K satisfying some such relation 
as < 1 fc — 1 < 2. The effective d o.f. of v as giyen by Welch [1] and as given 

]jy p = P{v > io) or by P = P(| « | > <o) for fixed P (listed in Table V as calcu¬ 
lated d.o f ) are identical for 71 = 0,1, an «> (wi = n^) and (li) K = 0, > 

and 00 (n\ 5 ^ Wj). For other values of K it appears from Table V that Welch’s 
formula errs on the conservative side. The effective number of d 0 f. vary between 
m na and min(ni, rii) (cf. d o.f. for S). Consequently in the absence of any 


Sample Size 


n\ -f I "Hi + I “ 
ni + I “ nt + 1 ” 
ni + 1 “ 0, Til -f-1 


TABLE V 

Adjusted power function of v in the light of‘effective’ degrees of freedom 

I Adhsted ummettic^power function of t 1 EBective d.o f. 

for probability ol Type I error of .05 


K - 

6 » 
0 .125 

1 

4 

00 

JC = 0 

5 = 

125 

2 

4 

I Calculated 

“ ;ir = o 125 4 

» 


204 

204 

174 

384 

470 

470 

.384 j 2 3 30 

3 36 

2 

225 

.2.80 

230 

225 

550 

.5*tl 

5KI 

650 ' 1) 0 14 

0 H 

e 

1 210 

227 

2>I2 

2.33 

504 

.050 

5^14 

572 ’ 4 0 60 

n flO 

p 


Welch's formula 


2 2 04 2 94 2 
S 8 82 8 82 6 
4 6 14 11 90 8 


best unbiased test and in the light of any approximate information about K it 
would appear that ti has a decided advantage over any other statistic. _ 

2.7. The Behrens-Fzsher test in repeated sampling. Consider the statistic 

d = (x - x') {s\ + S 2 )~* = k sin e - k cos 6, 

where s? and si are the unbiased estimates of the variances of the means £ and x' 
respectively, ta and k have independent “Student’s” distributions with and n, 
d.o.f. respectively, and tan e = Vsa. On the basis of the‘‘fiducial’ “utm^^^^ 
<t 1 and ol Fisher [6] regards d as a “mixture” of k and k with constant coe^ent^ 
It is to be noted that if Sv and S 2 are fixed in the classical sense k and k have 
independent normal conditional distributions with zero 
ai/sf and ol/sl respectively; and if sx and 52 vary in their own distribution d 

identical with y (Section 1). , v ri- 1 ^ 4 

Neyman [9] considered the integral of the joint probability law of x, a , Sx. s, 

over the set ^ ~ \ < k sm 6 - k cos 6 where the quantity on the right also 

depends upo^^f'atd's 2 and is the quantity d tabulated by Sukhatme [10], [11] 


UTTVM CIIAND 


51f) 


Ncyman uhowed in pari-ieular tluit if pairs nf normal populations with different K 
are sampled (ni + 1 = 13, ?ii + 1 = 7), then the relative frccpiency of correct 
statements about mi — mj bivsed on the r)% points of d will not be equal to the 
expected .95 and Avill \'ary with K. 

We consider here the follow ins similar typo of question: what is the nature of 
discrepancies that rrill ari.'-i? in tlie iJrobid)ility of Type I error by the repeated 
use of the Bchrons-Fislier test in samplinR from two normal populations? We 
observe that since d and u iiave the same structural form, the appropriate 
probability of Type I error in .svicli a situation will be given by the probability 
integral of v (vSections 2.2 and 2,5). 


TABLE VI 


Minimum anil maximum^ tallies uj /’( | » ] > do) /or iliffcrcni values of K 


K 

1 “ 

05 

1 

2 


do 


i ,05 

0021 

(«07 

0321 

.05 

2.F17 


! .0608 

.0329 

,0313 

.0329 

.0608 

2.435 

«i + 1 “ nj d" 1 = '•) 

1 .0,5 

.0.302 

t)316 

.0362 

.05 

2.306 


' ,0618 

.0367 

.0368 

.0367 

.0612 

2 292 

Til -t- 1 •= iis -1- 1 =■ IH 

1 .05 

.0-105 

(mif) 

.(M05 

.05 

2.179 


1 .0607 

,0434 

.0403 

.0434 

.0607 

2.170 

Til -(- 1 » 7, ni + L ” 1 

1 .o:«t7 

.0281 

.o;u7 

.0)193 

.06 

2.447 


.06 

.0480 

.0616 

.0697 

.0720 

2.179 

Tlj ^ ^*5 

i .05 

.05 

.05 

.05 

,06 

1.960 


t maximum values Imve been indicated in Imld type 


We observe that Pda] > x) is a monotone decreasing function of x for any 

dP > 

fixed K, Til and tij . Furthermore for fixed x, ni and nz wc have ^ 0 for (i) 

(lit ^ 

K 1, Til = nj and (ii) K ^ ^ • Table VI gives the minimum 

and maximum values of P(1 v j > do) for different values of K where do corre¬ 
sponds to the highest and lowest value of tabulated d. It appears that for equal 
sample sizes the minimum probability of Type I error is less than .05 and will 
converge to ,06 when K is either infinity or zero. The maximum probability of 
Type I error converges to a value slightly higher than .05. This probability also 
converges to .05 with increasing size of equal samples for every K. For unequal 
sample sizes e.g. ni < Uz, the minimum values converge to .05 when K ~ °o and 
if ni > riz, this convergence takes place when K => 0. The maximum values 
are both greater and less than ,05. 


3. Hypothesis of equality of regression coefficients when residual variances 
are unequal. 

3.1. Unbiasedness of tests based on statistics t* and v*. Consider 


t* = (bi - h) 


' S(y - YY d- S'{y' - Y'Y 

ni + nz — 2 \Mi Mz) _ 




COMPABISON OF TWO MEANS 


517 


and 


= ( 5 i - 62) 


' S(y - y) ' S'( 2 /' - Yf 
Mi{ny - Mi{n, - 1 ) 


hi 


where h and 62 are regression coefficients calculated from independent samples; Y 
and Y' are the sample regression functions; Mi == S(x - xy and Mj = S'(x'-x'y 
Under the assumptions of Section 1 these two quantities are distributed as 

= (f + A) (miXi,*i-1 + 

= (^ -h A) (XiXl.tt,-! + X2X2,n5-l)~\ 


respectively, where f is N(0, 1 ) and the x*’s have independent x'-distribution 
with d.o.f. indicated in the second subscripts, and where 

Ml/Ml = w, 


Ml = K{w + 1) (K + ta) * (m + 712 - 2)~\ 
M2 = (ry + 1) {K + to)“‘ (ni + 712 — 2)“‘, 



A = (di - di) 


' 2 2 \-i 

-fl 4- £i 1 

.M, Mj/ ' 


Xi = K{K + w)-^ (Til - ir\ 

Xa = w(iiL + m) ^ (712 — 1)“*, 




712 — 1 
7ii — r 


Consequently these two statistics have the same basic distribution as obtained 
previously for tk (Section 2.1) and their power functions are monotone increasing 
functions of the standardized ‘distance’ A for fixed values of K, w, rii and 712 . 
While the statistic t* has “Student’s” distribution with 711 + ?i 2 - 2 d 0 f. 
whenever K = 1, the statistic v* is only so distributed when K = 10(711 — 1) 

(n2 - ir\ 

3.2, Variaiions in the ‘probdbihty of Type I error and power function of t* and v*. 
The behavior of the partial derivatives of the probability of Type I error and 
the power function of t* and v* w.r t. K and also in relation to w is essentially 
the same. For purposes of illustration we shall only consider the behavior of the 
probability of Type I error. We shall presently see that for the hypothesis 
di = d 2 (cf. “Student’s” hypothesis mi - m 2 ) while t* is sensitive to the variation 
of K and w, v* is insensitive to both. 

3.2.1. Variations w.r.t. K for fixed w Remembering that the x’^ lu the de¬ 
nominator of t* have respectively ni — T and 712 — 1 d.o.f., we can write down 
P(t* > to) from the corresponding form for ii (Section 2.3). After simplification 
we obtain 


(3.2.1.1) 


^ < Li[(7i2 - 1) - w(m - 1)] (K + ui) V-K (K < 1) 



518 


lll'AM CJUND 


whore zo = (H- If w niako iiso (if the, rolation P(ni, n-, Hh , , K) ~ 

P(.ni, nx, Mx, Mx, IC^) in (8.2.1.1) wo obtain 


(3.2.1.2) 


> UK H- «.) ' [(nj - 1) - - 1)] 


(K > 1 ). 


where Lx and Lx are certain po.sitivo. eonatantf) independent of il/i, Ah and K. 
Similarly for the btaliHti(! v* wc have 


(3.2.1.3) 


< DxiUUibh - 1) - u)(n, - m/iK + w) (K,j> < 1) 


(3 2.1.4) 


^ > Dilith - 1) - w(ni - l)(i>]/(K + w) 


iU > 1 ). 


•whore Dx and 1\ arc certain po.sitive constanl.s independent of K, Jl/i and AIx and 

where 4 > - AVe nolicc that if (i) nx - rix and w = 1 or (ii) w = j -, 

we ha've i* = v* and both from (3.2.1.1), (3.2.1.2) and from (3.2,1.3), (3.2.1.4) 

we obtain ^ § 0 for fif § 1. In the ca.se (i) the maximum probability of Type I 

error occurs at /C = « and K = 0. In caae (ii) the maximum Tvill sometime.? 
occur for /iC = 0 and sometimes for K <», depending on the relative magnitude 
of Til and nx . 

For other situations i* and a* exhibit a type of behavior OR.se.ntially similar 
to that of ix and n (Section 2.5) Wo notice that the (F, K) curve for v* has a 

minimum when K = If ni = , the minimum point i.s given by 

TTa — 1 

K = w Therefore noth an approximate knowledge of K, a useful practical hint 
to remember is to so adjust Mx and Jlf 2 as to have tu approximately equal to K. 
If Til 9 ^ Ui any information about <ri being greater or less than crl can be used 
with decided advantage to adjust Ah , AIx, nx and nx so as to reduce considerably 
the risk of the first kind and thus work in a region of the (F, K) curve where 
there is not much danger of bias in the probability of Typo I error. This will 
also reduce the fluctuations of the power function of v about its minimum which 

also occurs for K = 

na — 1 

3,2.2. Vanaiions in relation to w for fixed K. The partial derivative of F(<* > to) 
Avith respect to w is given by 

= Ml - + wU £ (1 - Kf 


(3.2.2.1) 


-zo)* 


+ I 


(7C < 1). 



COMPARISON OP TWO MEANS 


519 


Therefore 


for K < 1. 
Similarly 


aw 


^ < 0 

for K > 1. 

To justify the differentiation of the series in (3.2 2.1) we make use of the lesult 




ni + Rj — 2 
2 


+ h,i -L, 


711 + 712 — 2 


+ h + 1, ^ 

t 

(1 - 2 d)* 


(ni+n2-2)l2+h 

2o 


'ni + 712 — 2 


+ ^ ^ 


Til “1“ 712 — 2 


+ h,^ 


and consequently the seiies under consideration may be shown to be dominated 
by an absolutely and uniformly convergent series for 0 < K < 1 
For the statistic v* consider 

Piv* > k) = + h\ 

( 3 . 2.2 2 ) """ 

• [r(/i + + h, {K ^ < 1) 

where ?/» = (1 + Xi^o)'"^ We notice from (3.2.2.2) and from the form of quantities 
Xi and X 2 (Section 3.1) that P(v* > lo) depends on K and w only through the 
product of K and 1 /w. Consequently variations of P w.r.t. l/w for fixed K 
are the same as those of P w.r t. if for fixed w. Thus we may directly infer that 
P(y* > to) will be insensitive to the variations of w. The following Table VII 
will illustrate the nature of variations in the probability of Type I error in the 
tests based on t* and y* in relation to w. 


TABLE VII 


Vanahona in the probability of Type I error of t* and v* 
(IC = 2; ni = n, = 7; <0 = 1.782) _ 


1 # 

9 

.25 

5 

1 

2 

«e 

P((* > ll) 

0269 

.0358 

.0427 

0512 

.0594 

.0866 

P{V* > U) 

0626 

0570 

.0539 

0512 

.06 

.0625 


It would appear that on the analogy of statistics ti and v for the comparison of 
two means one could guess about the sensitive nature of t* in relation to the 



S20 


tlTTAM CHAND 


variations of the ‘miisancc'' iiarainctor K. ''I'lie adrlitional drawljack in t* which 
Ktoins from tlic monotone nature of its variations with respect to la is a further 
warning against the use of a t* typo statistic for the hypothesis /3i = when 

CTl ^ <731 . 


4. Hypothesis of equality of two linear regression functions when variances 
are unequal. 

4.1. The ftlaiinlic Z. (For notation refer to Sec,linns 2.1 and 3.1). Consider the 
model given in Sections 1 and 3 for the comparison of two regression c.oefHcients, 
If the variances are eciual, the statistic hiiscd on the likelihood ratio criterion 
for the compo.site hypothcsi.s ai = ofa and ft = ft is given by 

V _ (^1 ~ d" l)(n.2 d- l)(wi d- Uj d" 2) * -f ~ + M^) ‘ 

Sjy - Yy + - FT 

The quantity Z is di.stributcd like the ratio of two independently di.stributed x’’s 
and consequently its distriliution is precisely determined under the hypothesis. 
If ai crl, Z can lie put in tlic form of 

Z - (aixi.i d- Oixl-i) (Kxi,ni-i d* X4.n»-i) \ 

which is now distributed as the ratio of 'mixtures’ of independently distributed 
x“’s with d.o.f. indicated in the second subscripts and where 

ai =■ [ni d“ 1 d" Kiuj •+■ 1)] (ni d- wj -f- 2) \ 

Ui = {K w) (1 + w)~^. 


In the non-null case when m 5^ a?, /9i 5^ ft the numerator of Z is a mixture of 
non-central squares. If we let PiK, w, S, A, ni, ih) denote the power function 
of Z, following Kobbins and Pittman [12] we obtain 


p{K, w, 5, A, ni. rii) 

(4.1.1) 


2 S Sc; dhPkIf " d- h — 1, fc -Hi d" 1 ') 

;«0 A-fl t-o \ 2 / 


(^K > 1, w < 


Til "H l\ 

nad- 1/’ 


cj 


(ai/tt2)*r(i-H^) 

Tim 


(1 - ai/ai)’, 


Vk 



= (z)“ = «* + aT 


H — (1 “H Za! 0 , 1 ) ^. 


where 




COMPARISON OP TWO MEANS 


521 


4 2. Variations in the probability of Type 1 error and the power function of Z 
Corresponding to (4 1.1) we obtain the expression for the probability of Type I 
error P{Z > Zo) by putting D = 0 and fc = 0. It has not been possible to establish 
any definite law concerning the behavior of the probability of Type I error 
and the power function w.r.t. the 'nuisance’ parameter K. However we shall 
presently establish their monotone dependence on the variable parameter w. 

We differentiate P(Z > Zo) with respect to w and after simplification obtain 


er , (jc - - i'Y -i'/i - 

dw l'r(^) L 2 \ ao/ 02 \ 02 

r + 11 1 i 1 l^ - -- j.rO + 3/2) 


< 0 


for /C > 1 w < , Similarly by utilizing an appropriate expression for 

’ no + 1 

—I— "I 

P{Z > Zo) for K > l,w > we can show that ^ < 0. For the case 

iC < 1 it can be shown that P(Z > Zo) is a monotone increasing function of ir. 
This IS also true of the dependence of the power function of Z on w. 

4 3. Unbiasedness of Z. We differentiate (4,11) w.r.t. 6 and A and after 

simplihcation obtain^ > 0, ^ > 0. Thus the power function of Zhas a relative 


minimum at 5 = 0, A = 0 

The author is greatly indebted to Professors Harold Hotelling and William b. 
Madow for guidance in this research and to the referees for many useful sug¬ 
gestions and criticisms. 


REFERENCES 

[11 B. L, Wbuoh, “The sigaifioance of the 

tion variances are unequal”, Bwwelnte, Vol 29 (1938), pp 3^®^' 

[2] M G, Kendall, The Advanced Theory of StaUslics, Vol 2, . 1 •. > 

B| P,L"H™^oLliibul.on,olh.lh.o,y.f"6tud.ntV'Ma.t«^ 

SSbCn.St’.«.l i..™...", A-* eu,,,,,,. 

Soc , Supp; Vol 2 (1936), pp lor-180 


R 


[7] J 



522 


THTjUM chand 


()?) J. I‘). Wai.&h, “On llii' iiiiniT I'Uiciciicy nf n fotmcd by pairing sample values" 
Annnls of Mnlh. i^Uil , Vnl .‘W (1947), pp. 6(11-(MM 
(9) J. Nbyman, “I’iducinl argiimfiit and the tlioorv of conlidfiTce intervals", Biomeinka 
Vol.33 atlll), PI" I2fi. ff 

[10| P. V, SUKHATME, "(111 Fislii'r and Hclirciis’ lest of significance for the difference in 
means of tiro normal saiiiiiles", Bnnkht/S., Vol 4 (I'HW), pp. .39-^18 
[11] R. A FtHHKK AND F Vatks, Sialialiail Tnhlen, Olivei and Hoyd, 19-18. 

[L2| II. Ronnia.s anii M. J. (!. I’it.man, "Applieation of the method of mi.\tun>B to quadratic 
foriiis 111 iiornial vaii.iles", Aiinalit of Math. Slttl., Vnl 20 flOlO), pp .'),')2--560 



THE EXTREMAL QUOTIENT 

By E. J. GtTMBEL AND R. D, Keeney 
New York Ciiy and Metropolitan Life Insurance Company 

Summary. The extremal quotient is defined as the ratio of the largest to the 
absolute value of the smallest observation. Its analytical properties for sym¬ 
metrical, continuous and unlimited distributions are obtained from a study of 
the auto-quotient defined as the ratio of two non-negative variates ivith identi¬ 
cal distributions. The relation of the two statistics is established by proving 
that, for sufficiently large samples from an initial distribution with median zero, 
the largest (or smallest) value may be assumed to be positive (or negative) 
and that the extremes are independent. It follows that the distribution and the 
piobability of the extremal quotient possess certain symmetries, and that its 
median is unity. As many moments exist for the extremal quotient as moments 
and reciprocal moments exist simultaneously for the initial variate The loga¬ 
rithm of the extremal quotient is symmetrically distributed. These properties 
hold for all continuous symmetrical unlimited variates which posse.ss a mono- 
tonically increasing probability function. 

For the exponential type, the asymptotic distribution of the extiemal quo¬ 
tient can only be expressed by an integral. In this case, no moments exist. For 
the Cauchy type, the asymptotic distribution is very simple, and the logarithm 
of the extremal quotient has the same distribution as the midrange for initial 
distributions of the exponential type. 

It is not necessary to consider asymmetrical distributions since, in this case, 
for sufficiently large samples, one of the extremes will outweigh the other, 
unless the distribution is nearly symmetrical or has lapidly varying tails. 


1. The auto-quotient and the extremal quotient. Let x and y be two inde¬ 
pendent non-negative continuous variates, unlimited to the right Let/i(:c) and 
fiiv) be the distributions (probability densities), and let Fi{x) and Fi{y) be 
the probability functions. Then the joint distribution of the two variates is 

their product The quotient 

( 1 . 1 ) Q = 

is also non-negative and unlimited to the right. Since 

- dx 
^ dQ 

the joint distribution w(y, Q) of the quotient Q and the variate y is 

^( 1 /, Q) = fiiyQ^My)'yf 

523 


( 1 . 2 ) 



i:. J. AND n. D. KKKNEY 


rm 

and thn nuugninl distrifiiUion h(Q) of thn variato Q alonn ht'coina.s 
(13) /t(0) = f yfi(yQ)My) <l!/. 

Jo 

'rhc quotient Q pohsi'.skpk a modn if (and only if) /i(.(’) i)os.s('.=wc'h a mode. 

Asaumc now that the two varintea jc and if have, the aame distribution 

(1.4) Mx) =» /(a:); Mtj) = fOj) 

with the aame parameter vahies. The quotient of two variate.s with identical 
distributions is henceforth called the auto-guolienl (?a . It may be realized if there 
are two independent series of observations taken from the. same population and 
ordered in time, Eaclr value from the first series is divided by the corresponding 
value from the second series. Another realization eonsist.s in dividing each value 
obtained in one scries of independent observations by every other value. A 
third realization is obtained by considering two asymmetrical distributions 
fiix) mdfiiy) w’here x ^ 0, y ^ 0, and 

(1.40 U(V) = 

The two distributions are called mutually symmoLrical, and the auto-quotient 
is 

« x/{—y). 

From the definition of the auto-quotient it follows that the distribution of 
must bo the same as the distribution of its reciprocal r = 1/jo • The proof of this 
statement is simple. Under the condition (1.4), the distribution h(qa) becomes, 
from (1 3) 

(1.5) h(qa) = f yf(.yqo)f(y) dy. 

Jo 

The distribution /ii(r) of the reciprocal is 

^ j[ yfiy/r)fiy) dy. 

If y/r is replaced by x, the distribution of r is 

(1.6) hir) = A( 5 „). 

Thus, the distribution of the auto-quotient of a non-negative unlimited variate 
is invariant under a reciprocal transformation. 

The shape of the distribution h(qa) and the location of the mode may be ob¬ 
tained from the density of probability h{\/q^ at the value l/qa (which differs, 
of course, from the distribution hi{r) oir — l/q^. From (1.6) wo obtain 

li(l/g<.) = [ vKy/q^Ky) dy. 

Jo 



THE EXTREMAL QUOTIENT 


526 


The transformation 
leads to 


y/qa = 2 , dy — qadz, 


Hl/q^) = qlh{qa). 

This is a symmetry 1 elation for the distribution of the auto-quotient of a non¬ 
negative unlimited variate. If q„ is larger than unity, 

(1-8) /i(l/g.) > h(qa). 

If the distribution h(q^) is continuous for all values of , the derivative (Sl 
equation (1.7) with respect to qa leads, for g„ = 1, to 

(1.9) h'il) = -h(l) 

If the distribution h(qa) possesses a unique mode, it must be less than unity. 
The moments g* are, from (1.5) 

2 “ = f f q^yKqy)f(y) dy dq 
f(lj) 

= / Way) Siqay) d{qay) dy. 

•'v-o y 

The inner integral is the moment 2 /* of order k of the initial variate y, and the 
remaining integral is its reciprocal moment y"* of order — k. Thus 

0 .10) S = ?^ = gr. 

The moments of order k and of order — fc of g^ exist if the moments and the 
reciprocal moments of order k for the initial variate exist simultaneously. The 
second equation in (1.10) also follows immediately from the invariance of q^ 
under, a reciprocal transformation. Even if the initial distribution possesses all 
moments, the mean g^ need not exist, and the same holds, of course, for the mean 
error and the higher moments The procedure, usual in economic and meteorolog¬ 
ical statistics, of calculating the quotients of two senes of independent posi¬ 
tive variables in order to test whether this ratio is constant may be misleading, 
especially if the two series happen to be samples taken from the same population. 
The theoretical mean need not exist, and the calculated mean of the observed 
quotients need not characterize the relation between the two series. 

The probability function H{Q) of the quotient Q obtained from (1,3) is 

H{Q) = f f yfi(.zy)My) dy dz. 

Jo 

Change of the order of integration leads to 

H{Q) = [ My)FiiQy) dy 

Jo 



520 


i; J. CJf.MIlKti AND H, 1). KIvKN’BY 


The probability fuiKilion //(('/«) of the iiuto-quolicnl obtained from (1.4) jg 

(1.11) //(««) - f‘ Hn.]/) (IF 

-'0 

Inlegratioii by piirla leads to 

(1.12) //(?J “ I - (/« f F(}j)f{(i^i/) tly. 

Jo 

The, boundary eoridition, //(O) = 0; //(«) = I eaii ittirnediulely be, verified if 
t,he preceding equation is written in the. form 

(1.13) n(qa) - 1 - / FW<la)f{z) dz 

Jo 

The probability H{qa) possesMe.H a Hynimetry relation which is analogous to 
(1.7). The probability at the value I/jo w, from (1.11), 

= [ F(i//qa)f0j) dy. 

Jo 

II wc introduce the variable of integration 

?/ == <laZ, 

wo obtain from (1.12) 

(1.14) n{q.) « 1 - //(!/«„). 

II go is any quantile, such that //(go) = P, its reciprocal 1/go has the probability 
1 — F. The first quartile (decile) is the reciprocal of the third quartile, (ninth 
dfecile) and so on, 

For go = 1, equation (1,14) leads to 
(1.140 ^(1) = 1. 

The median of the aulo-quoUenl of a 'positive unlimited variate is unity. From 
(1.9) it follows that the median surpasses the mode, if a unique mode exists 
Finally, equation (1.14) may be used to construct a symmetrical distribution. 
If a new variate 

(1.15) z = Ig go 

with the probability function H*{z) is introduced, the symmetry relation (1.14) 
becomes 

(1.16) H*{z) = 1 - H*{-z). 

The logarithm of the auto-quotient of a positive unlimited variate has a sym¬ 
metrical distribution about median zero. The geometric mean of go exists and is 
equal to unity. 



THE extremal QUOTIENT 


527 


J hese lesults hold if each observed value of a non-negative unlimited variate 
IS divided by each other observed value. They do not hold for the quotients of 
two specific order statistics because, in general, the fundamental assumption of 
independence does no longer hold. However, some consequences for the quotients 
of extreme mth values may bo deduced. 

Consider a symmetrical unlimited variate. Then the distribution 
of the wth smallest value ,»a:, and the distribution of the mth largest value 

a;,„ are mutually .symmetrical in the sense of (1.4') Therefore the extremal 
quotient 

(1.17) ^ 

m ^ 

may be interpreted as an auto-quotient provided that 1) the probability for 
Xm to be negative, and to be positive, may be neglected; 2) the distributions 
of the mth smallest and the mth largest values are independent. Under these 
conditions the distribution, the moments, and the probability function of the 
extremal quotient are obtained from (1 5), (1.10), and (1.11) respectively, if 
the initial distribution /(y) is replaced by the distribution of the mth largest 
values <flm(xm) The symmetry relations (1.7) and (1.14) and their consequence, 
that the median i.s equal to unity, hold in particular for m = 1, i.e for the ex¬ 
tremal quotient proper, 

The validity of the two conditions has now to be established. 

a) Consider a symmetrical distribution f(x) with median zero. Then the 
probability that the largest among n observations, Xn , is equal to or less than a 
certain a:, is 1 — F''(x) The probability P that the largest among n values is 
positive, i.e. larger than the median, is 

(1.18) P = 1 - 2'". 

If n is sufficiently large, this probability differs from unity^ by an amount that 
can be made as small as we please Even for relatively .small samples, say n = 20, 
the probability that the largest value will be positive is of the order 1 — 10 . 
Thus, we expect only one largest value in a million samples of size 20 to be nega¬ 
tive The same argument shows that the smallest value Xi may be expected to 
be negative. Thus the postulate 

(1.19) a:„ SO; Xi ^ 0, 

is a very weak restriction upon the sample size. If m is sufficiently small, the 

same result holds for the mth extremes. 

b) It is known [7] that the joint distribution , x„) of the extremes taken 

from an initial distribution of the exponential type converges, for sufficiently 
large samples, toward the product of the asymptotic distribution ip{xn) of the 
largest value, and of the smallest value. A similar theorem will now he 

proven for a general clas.s of continuous distributions. 



528 


K, J. ClI’MIiKIj AND n. IJ. KKKNKV 


I^t nX be the mth .smallcHt ntiHcrvation; let xt be the hh largest observation 
■where m and I are small compared to n, n being large. 'Fheu the joint di.stribution 

is 

, ^ _ ?r! 

(1.20) («' “ l) i(/ - DKn ^ w ~ l)i 

FLxr~^(Fix,) ~ FLx)r-"'-‘ii - F{xty-^jux)f{xd). 

Now the transformation 

(1.21) n(l - F(x,)) - t; nF{„,x) = 0 g $ g n, 0 g ^ ^ n, 

due to Cram& ([1], p. 371) is UKtsl. Then the joint distribulioiv i'„(^, if) of the 
new variates $ and tj hee.ome.s 


where VI + I is small compared to n. As n inei'ea.ses, L'„(f, tj) eonvc'rges to 


BO that in the limit ^ and v are independent. If now the, miltl resUietion is im¬ 
posed tliat F(x) bo monotonically increa.sing, (1,21) tlelinea a one to one transtor- 
malion, and llic'rofore there, must exist an invenso funetirjn iiniriuoly defining 
„z as a function of and aii as a function of ti. From the limiting independeneo 
of £ and ij the Limiting independence of the extremes mX and xt follows at once. 

Thus the second condition is fulRllod, and the wlh extremal ciuotient shares 
all properties of the auto-quotient. This holds also for initial symmetrical dis¬ 
tributions which do not possess asymptotic distributions of the extremes. 

In the following, the two types of initial di8tribution.s of an unlimited variate 
are considered for which asymptotic distributions of the cxti ernes exist, namely, 
the exponential and the Cauchy type. For simplicity, only the extremal quotient 
proper, designated by g, is studied. The two asymptotic probabilitie.s of the 
extremal quotients for these symmetrical distributions are obtained by introduo- 
ing the asymptotic distributions of the largest value into the probability func¬ 
tion (1.11) of the auto-quotient. 


2. Application to the exponential t 3 fpe. J'or symmetrical distributions of tlio 
e.xponcntial type the asymptotic distribution of the largest value is 

(2.1) tp{x) = a exp [—a(a: — w) — 

where u and a are defined in terms of the initial probability F(,x-) and tho inil ial 
distribution f(x) by 

(2.2) F{u) = 1 — 1/n; a ^ 

n being the sample size. The distribution (2 1) will now be simplified by intro¬ 
ducing a new parameter X defined by 

(2.3) 


X > 0. 



THE EXTBEMAL QUOTIENT 


529 


To see the meaning of X, consider Laplace’s first distribution, then the so 
called logistic [6], and the normal distributions, all of which are of the exponential 
type In the first two cases we obtain, from (2.2), after some calculations, 

(2.4) a = 1, u = lgn-lg2; a = l- l/n, u = Ig (n - 1), 
whereas for the normal distribution, we have asymptotically 

°‘ = u = \/2 Ig (VV^) 
and 

(2 4') X = nV(2x). 

For these distributions, and interpreted in this sense, X is of the order of the 
sample size or its square. 

From (2.3) and (2.1) the distribution ip(x) and the probabihty function 4>(.t) 
are 


(2 5) Vj(a;) = aX exp [—ax — Xe “*]; $[x) = exp [—Xe “*]. 

In order to fulfill the condition (1.19), namely 4(0) = 0, the distribution ip{x) 
must be truncated a,t x = Q. This leads to the truncated distribution ipi(x) and 
the truncated probability $t(a:) where 


( 2 . 6 ) (pi{x) 


aX exp [—ax — Xe “] 

1 - ’ 


4>i(a:) 


exp [-Xe"""*] - e~’' 
1 - 


The asymptotic probability function H\{q) for the extremal quotient of a sym¬ 
metrical variate of the exponential type is now obtained from (1.11), if y, j{y), 
and F{y), are replaced by x, (pi{x) and ^i.{x), respectively, and the index o is 
dropped. Consequently, from (2.6), 


= 71 - - —xTi f - Xe “ — Xe “'*] dz 

(1 — Jo 

— ^ [ aX exp [-az - Xe““'] dz. 

(1 — e Jo 

The tiansforraation 


2; 


ae “dx = —dz 


leads to 
(2.7) 


= 


_i_ r 

(1 — e~’')“ Jo 


d0 - 



This probability of the extremal quotient for initial symmetrical distributions 
of the exponential type is not truely asymptotic since the parameter X depends 
upon n. (See Addendum). 

Unfortunately, the expression (2.7) cannot be integrated Therefore the prob¬ 
ability function has to be studied in an analjdic way. For this purpose we first 


recall the general properties 

ff(0) = 0 ; H(l) = h = 1 , 


valid for any value of X Furthermore,' for any X, we have the symmetry rela¬ 
tion (1 14). These properties can be verified at once from (2 7). 



530 


K. J. «i:.MIUvL AND H II, KKKN'KY 


The numerical valucK of lixiqi ran easily be calculated for q ~ ^ and g = 2, 
Consider a value of X, say of tlie order 6. Then formula (2,7) may be written 


lhi2) « t dz 

= a/X f V\dz. 

h 


If wc introtUice 


a/x (? + ^) = 


"s/x dz 


V A -r , -V A CAB , 

the probability //x(2) becomes a difference of two normal probability integrals, 

1U% = V^x [l - 5’ - (l - F (s , 

where F stands for the normal probability function. 

The second expression may be neglected compared to the first one for X ^ 4, 
whence 


lh{ 2 ) * A r e- 

•'vTTs 


Tho symmetry relation (J.14) leads to the knowledge of Hx{^). Thus the three 
probabilities 1^(1), and /7x(2) are known. 

To see the influence of X on //x(2), wo use a mctliod due to R. D. Gordon [4]. 
This author considers a function /i, defined by 


2. = f 


and proves that 


^ = oift - 1 < 0; 
dx 


e di, X > 0, 


d^R dill , p ^ „ 
_ _ , _ + B > 0. 


It follows that 


^ > 0- 


If we substitute vV2 for^®) this inequality may be written, from (2.9) and (2.10), 


=* 2\/2X 


dHxi2) 


Consequently Hx{2) increases with X whereas, from (1.14), the probability 
H\{^) decreases with X. The following table gives the probabilities H\{2) and 
(2.9) and their differences 

(2.11) Px(2) = Hx(2) - ffx(i). 



THE EXTREMAL QUOTIENT 


531 


Asympiohc probabilities of the extremal quotient for symmetrical distributions of 

the exponential type 


Ptirfimeter 

- - 

Probabilities (2.9), ( 1 14) 

Probability (2.11) 

X 

l/x(2) 

HAi) 

Px(2) 

8 

.84376 

.15624 

.68752 

18 

.91377 

.08623 

,82754 

32 

.94661 

.05339 

.89322 

50 

.96438 

.03562 

.92876 

72 

.97427 

.02573 

.94854 

98 

.98087 

.01913 

.96174 


The approximative shape of lh{q) is traced, for X = 8,..., 98, and ^ < g < 2 
in Graph (1). Since we know from (1.16) that Ig g has a symmetrical distribu¬ 
tion, we use a logarithmically normal probability paper where g is plotted on 
the abscissa in a logarithmic scale, and H\{q) is plotted on the ordinate in a 
normal probability scale The probability Pk( 2) for any value of g to be con¬ 
tained in the interval ^ < q <2 increases with X, i.e., with the sample size, and 
the distriloution of the extremal quotient contracts. 


I) ASYMPTOTIC PROBABILITY OF THE EXTREMAL 
QUOTIENT FOR THE EXPONENTIAL TYPE 



If the initial distribution is unknown, the parameter X has to be estimated 
from the observed extremal quotients. Equation (2.11) may be used for this 




PPOBABfLtTY 


532 


E. J. GUMBEL AND R. D. KEENEY 


piirpose. We calculate the observed relative frequency Px(2) of extremal quo¬ 
tients contained between q — ^ and ? = 2, and substitute it for the probability 
Px(2). To facilitate this estimate of X, we trace Px(2) against X in graph (2). 
The probability Px(2) is traced on the ordinate in linear scale, and the parameter 
X is traced on the abscissa in inverse scale. Thus X is easily estimated from the 
observed relative frequency Pi(2). 

2) ESTIMATION OF THE PARAMETER X 


12 .11 ,10 .09 .oe .07 ,0G .00 .04 .OX ,02 ,01 Q 



The distribution hx(g) of the extremal quotient obtained by differentiating 
the probability function (2,7) with respect to q is 

( 2 . 12 ) - hk) = I *) 

The symmetry relation (1.7) is easily verified. We now investigate the boundary 
value hx(0) and prove that 

(2.13) lim hx(g) = Xx(0). 

fl -0 

This is not obvious, since z* becomes indeterminate if both z and q vanish. For 
the proof of (2 13), consider the integral 

I = X f e''^‘(—Igz) dz 

vO 


(2.14) 



THE EXTREMAL QUOTIENT 


533 


or 


(2.15) I — (1 —e ^)lgX—7 + e ’'IgX — «(—X). 

The last term, the exponential integral, is positive. The value of A),(0) is thus 
from (2.12) 


(2.16) 

The difference 


/ix(0) 


Xe~’'(lg X — 7 — m(—X)) 

(1 - e-'i-y 


A = (1 _ e-^)\hiq) - h^{Q)) 

becomes, from (2.12), (2.15) and (2 16), by the use of the mean value theorem 
and after expansion 

^ = /(X) [ (e”^'" / — e“’‘) dz 
Jo 


= /(X) i 

v-O 


(-1)’'X'/ 1 

v' V(v + l)ff + 1 


‘) 


) 


where /(X) is a positive function. Since the senes is absolutely convergent, the 
difference A vanishes for g = 0, and the density of probability for g = 0 is given 
by (2.16). The condition /ix(0) ^ 0, valid for any distribution, is met provided 
that 

(2.17) X > 1.794 


By virtue of (2.4) this is a (weak) condition concerning the sample size. From 
(2.16) it follows that h},(0) does not vanish although its numerical value is very 
small. 

The existence of at least one mode follows from the fact that the distribution 
lix(g) is continuous, very small for g = 0, and vanishes for g = w. Equation 
(1.9) proves that any mode is inferior to unity. The distribution contracts for 
increasing values of the parameter Therefore the mode approaches the median 
with increasing sample size. 

Since the distributions of the exponential type do not possess reciprocal mo¬ 
ments it follows from (1.10) that the distribution h\(q) does not possess moments. 
The mean extremal quotient g diverges. Because the logarithmically normal 
distribution used in graph (1) as first approximation to the distribution h\(q) 
possesses all moments, the distribution /ix(g) has a much longer tail than the 
logarithmically normal one. 


3, Application to the Cauchy type. For the exponential type, the asymptotic 
distribution of the extremal quotient can only be expressed in the form of an 
integral containing a parameter X which is a function of the sample size. For the 
Cauchy type, to be defined in the following, the asymptotic distribution will 
turn out to be very simple. 



534 


E, J. GUMBEL AND K. B. KEENEV 


A distribution of a variato x £ 1 was said [5] to be of the Pareto type if 

(3.1) lim .•c‘(l - Fix)) = A; > 0; A > 0. 

XsasO 

We now say that a variate is of the Cauchy typo if it ia unlimited, continuous, 
subject to (3.1), and .syininetneal about zero. Distributioas of the Pareto and 
the Cauchy type do not posses,a momenta of an order equal to or larger than k 
However, not all unlimited Hymractrical distributions with a finite number of 
moments are of the Cauchy type. 

The simplest example of .such a dwtribution is the Cauchy distribution itself 

(3.2) jix) = - 77 - V T\ ; ^ - arc tg x, 

which possesses no moments. For largo absolute values of .x, the usual expansion 
leads to 

Fix) = 1 - -1 + Oix-^)- Fi-x) = — - 0 ( 0 - 

TTX TTX 

If the factors 0(.■«”’) are neglected, the parameters A and fo in (3.1) are 

(3.2') A = L = 1. 

For the Cauchy type, the asymptotic probability II(a;) and distribution rix) 
of the largest value x = Xn established by Frdchot [3], 11. A. Fishor [2] and R. von 
Mises [8] are 

(3.3) n(x) - exp ] ; ^(x-) = ^ Q exp [- ]. 

where u is defined by (2.2). 

The condition (1.19) is fulfilled for any sample size which is so large that the 
asymptotic distribution of the extremes may be used. The asymptotic prob¬ 
ability Hkiq) of the extremal quotient for the Cauchy type is obtained from (1.11), 
if Vi fiy) and Fiy) are replaced by x, irix), and n(a:), respectively, where the 
indices n and a are omitted. Consequently, from (3.3), 

Hkiq) = r - 

Jo u \x/ 

From the transformation 



the asymptotic probability Hkiq) ^'Ud the asymptotic distribution hkiq) of fhe 
extremal quotient become simply 




THE EXTREMAL QUOTIENT 


535 


Evidently, the symmetry relations (1.7) and (1.14) are fulfilled for any k. The 
graphs (3) and (4) show the distribution hk{q) and the probability Hk(g) for 
the most interesting cases fc = 1, 2, 3. From 

«(X - fl.W) 

it follows; For k increasing, the probability Hk{q) decreases for <? < 1, and in¬ 
creases for q> 1. The distribution contracts mih increasing values of the •parameter 
k as shown in the graphs (3) and (4). The more moments that exist in the initial 
distribution, the more concentrated is the distribution of the extremal quotient. 



EXTREMAL quotient cJ 


The density of probability 

/i*(l) = 

of the median obtained from (3.4) and (1.14') increases with fc. The mode q of 
the extremal quotient is obtained from (3.4) For k > 1 this leads to 


(3.5) 


'4 







536 


E. J. GUMBEL AND R. D, KEEN'EV 


For k ^ 1 no mode exists, and the distribution diminishes with q. The larger 
k, tlie smaller is the distance from the median to the mode, and hence, the 
smaller the asymmetry. The density of probability of the mode increase.s with 
k, and the probability 


(3.6) Hdg) = Ki - m 

approaches The distribution (3.4) belongs to the Pareto type and has no 
moments of an order ecpial to or greater than k. 

In N samples of sufficiently largo size n, the largest quotient , defined in 
the same way as u in equation (2.2), obtained from (3 4) 

(3.7) - 1 

increases as a root of the number of samples, i.e very quickly. The higher the 
Older of the highest moments existing, the smaller will the expected largest quo¬ 
tient be. 

From (3.4) and the symmetry (l.Tl) we obtain 

(3.8) m(q) - mH/q) = 1 - 2/(1 -h /). 


The larger k, the larger is the percentage of the observations contained in the 
interval l/q to q. 

For a systematic estimate of k, the transformation (1.13) is used. Formula 
(3.4) leads to the probability H*{z) and the distribution /i*f 2 ) where 


(3.9) 


H*iz) 


1 

1 + e-*' ’ 


h*(.z) 


ke~^‘ 

(1 + e-*')*’ 


The logarithm of the extremal quotient for initial distributions of the Cauchy 
type (where no moments of an order equaling or exceeding k exist) has the 
logistic distribution, [6], as the midrange v = Xn -h Xi for distiibutions of the 
exponential type (where all moments exist). The logarithm of the extremal 
quotient plotted on logistic probability paper should be scattered around a 
straight line 

The order k of the lowest moment which diverges is obtained from the vari¬ 
ance cr* of the distribution h*(z) which is [6] 


(3.10) 


<r, = 


3/i;*' 


For the estimate of fc from (3.10), <tI is replaced by the estimate s* obtained from 


(3.11) 


si 





•^n ,y 


For the Cauchy distribution itself, fc = 1, and the probability and the dis¬ 
tribution of the extremal quotient 


HM = g/(l + g); h{q) = (1 + g)"^ 



THE EXTHEMAIi QUOTIENT 


537 


are similar to the inilial distribution. 

The asymptotic distribution of the extremal quotient for initial distributions 
of the Cauchy type c«nlain.s one parameter only, the order of the lowest diverg¬ 
ing moment in the initial distribution All other traces of the initial distribution 
have disapiieared. 

4. Comparison of the extremal properties for the two types of initial distribu¬ 
tions. Assume that the initial distribution is symmetrical, unlimited, and pos¬ 
sesses an asymptotic distribution of the extremes. This is not always fulfilled. 
All moments may exist, and yet the distribution may not belong to the expo¬ 
nential type. No moment.s may exist, and yet the distribution may not belong 
to the Cauchy type. If the assumption holds, the initial distribution belongs 
either to the Cauchy, or to the exponential type. 

We take N samples of size n, and estimate the median X of the population 
from the central value m of the N central values of the samples Let Xi,v and 
X„,viv — 1,2, - ■ ■ A') bo the two extremes. If it happens for any ti that 

Xi.o > m or X„,, < m 

the sample is loo small, and its size has to be increased. The central value q of 
the observed c.vtreraal quotients q, = (X,., — m)/{m — Xi,„) must be near 
unity. 

If the initial distribution is of the exponential type, all moments in the popula¬ 
tion exist, and the midrange has the logistic distribution. If the initial distribu¬ 
tion is of tlie Cauchy type, uo moments of an order greater thah k exist, and the 
logarithm of the extremal quotient has the logistic distribution. The order k 
can be estimated from the variance (3.11). If all moments in the population di¬ 
verge, the calculation of the observed moments is futile since they do not charac¬ 
terize the population. 

Addendum. The referee of this paper has suggested the following method for 
obtaining an asymptotic distribution of the extremal quotient for the exponen¬ 
tial type. For large values of X, formula (2,7) becomes, approximately, 

miq) = 

*'0 


ft(«) - l' e»P {- » [‘ + (f)'"]} 

The further transformation 

e' = X“"Sg-l =t/lg\ 



538 


B. J. QUMDBL AND B. D, KEENEY 


leads to the probability H*{l) of the variate t 


H*{t) = Gxp{- 2 /[l + 4 

Jo 


■whence asymptotically for X co 

H*(0 “ f 0 xp|-j/(l + c"')) 4 

Jo 

= 1/(1 + c "*)* 

Therefore the logistic distribution holds at the same time for both initial types, 
using the transformation t = ctu{q ~ 1) for the exponential type, and the loga¬ 
rithmic transformation for the Cauchy type. 


REFERENCES 

[1] H. CBAMis, Matkmtical Methodi of StaliHios^ Princeton University Press, 1946. 

[2] R. A. Fisueh and L. H. C. Tippett, "Limiting forms of the frequency distribution 

of the smallest and the largest member of a sample/’ Proc. Camb. Philos, Soc,, 
Vol. 24 (1928), p. 180. 

[3] M. FnicHET, Sur la loi de probabiliW do l’6cart maximum. Annah Soc. Polon, Math,, 

Vol. 6 (1927). 

[4] R. D, Gordon, "Values of Mills ratio of area to boundary ordinate and of the normal 

probability integral for large values of the argument," Annah of Math. Stat, 
Vol. 12 (1941), pp. 364-366. 

[6] E. J. Gumdel, "The return period of flood flows," Annah of Math, Slat., Vol, 12 (1941), 
pp. 163-190, 

[6] E. J. (3tjmbid, "Ranges and midranges," Annah of Math, Slat., Vol. 15 (1944), pp. 

414r422. 

[7] E. J, Gumbbl, "On the independence of the extremes in a sample," Annah of Math, 

Slat., Vol 17 (1946), pp. 78-81. 

[8] R. VON Mibeb, "La distribution do la plus grande de n valeurs," Revue Math, de I’Vnm 

hterhalkanique, Vol. 1 (1936). 



ON A PRELIMINARY TEST FOR POOLING MEAN SQUARES 
m THE ANALYSIS OF VARIANCE^ 

By a. E. PAtiLL 

Grain Research Laboratory, Winnipeg 

Summoxy. The paper describes the consequences of performing a preliminary 
F-test in the analysis of variance. The use of the 5% or 25% significance level 
for the preliminary test results in disturbances that are frequently large enough 
to lead to incorrect inferences in the final test. A more stable procedure is recom¬ 
mended for performing the preliminary test in which the two mean squares 
are pooled only if their ratio is less than twice the 50% point. 

I. Intboduction 

The problem discussed in this paper is one of a large class involving preliminary 
testa of significance. Studies of this type have recently been made by Bancroft 
[1] and Hosteller [2], Bancroft dealt with a preliminary test for homogeneity 
of two variances, and a test of a regression coefficient. Hosteller dealt with the 
problem of pooling means from two normal populations having the same known 
variance. The present problem is ap. extension of Bancroft’s work from investiga¬ 
tions of the bias and variance of an estimate of variance, to investigations of the 
consequences of using that estimate in performing a further test of significance. 

The problem arises frequently in the analysis of variance. As a simple example, 
consider an experiment carried out to test the hypothesis that different labora¬ 
tories in a district all determine the protein content of wheat without systematic 
differences between laboratories. Three laboratories are selected at random 
and each is requested to analyze ten samples of the same wheat, five on each of 
two days. The analysis of variance would be set up in one of two ways: 


MODEL I 

Sourct of rariation df MS 

Between laboratories 2 

Between days within laba 3 vi 

Within days 24 Vi 


MODEL II 


Source of variation 

df 


Between laboratories 

2 

V, 

Within laboratories 

27 

3^2 "b 24vj 

O'? 


The soundest procedure is to follow Model I in which the F-ratio, Vi/vi, 
provides a valid though not very powerful test of the null hypothesis. But the 
investigator often doubts that this'is the most effective form of analysis His past 
experience may have shown that measurements of this kind seldom exhibit 
day-to-day variations appreciably greater than their within-day variations. 
If he is willing to accept this credible assumption, he adopts Model II because 

‘ Based on a doctoral dissertation submitted to the Faculty of North Carolina State 
College of the University of North Carolina at Raleigh, N. 0 , in June, 1948. Published as 
Paper No 107 of the Grain Research Laboratory, Board of Grain Commissioners, Winnipeg. 

539 



540 


A. B. I’AXTI/L, 


this increases the degreeK ef freedom from 2 iind 3 to 2 and 27.1'hese two models 
may conveniently be called the “never pool” and the “always pool” procedures, 
The investigator often prefcra what may be called a "sometimes pool" pro¬ 
cedure. He 8tart.s with Model I and examines the null hypothesis that the 
variation between days is no greater than the variation within days by testing 
the f^-ratio Ca/i'i. For this test, he selects a probability level Pi that may be the 
6% or some higher level. If the hypothesis of this preliminary test is not rejected, 
his judgement has been substantiated and he adopts Model II and pools the 
two mean squares. If the hypothesis is rejected, he retains Model I since he 
concludes that vt alone is the only valid estimate of error. 

The following notation is introduced: 


Bifraa efjretiitm 

Mittn xquart 

ExpecUd of mean squan 

n, 

Vi 


III 

Vt 

vl 

ni 

Vi 

•rl 


where <ri ^ < al. 

The mean squares Ci, Vs , and c* are assumed to be distributed as central 
chi-squares. This assumption is justified if the treatments (laboratories in the 
example) are selected at random from a population of treatments. But if, as is 
more frequently tlie case, the experimenter is interested only in specified treat¬ 
ments, the non-central chi-square model is the appropriate one. However, if 
the two cases are sufficiently parallel, as seems probable, conclusions drawn 
from the central model may bo expected to apply to the non-central model. 

Let fill = a\/a\ and dn = <r\/<rl , and lot F{yi , vj, P) denote the value exceeded 
by F for vi and vj degrees of freedom with probability P. The rule of procedure 
for the “sometimes pool” test may be restated as follows: 

Reject the mam hypothesis that ira = vifffas = 1 ) if 

n/vi > Fi{ni , ni ; Pi) and vi/vi > Pjfng, nj ; Pa) 

or if 

vi/vi < Fiiph , ni i Pi) and (na -)- «i)ti3/(nai’2 + niVi) > Fsini, rii + ni ; Pa). 

The “never pool” procedure in which Pa is used, and the “always pool” procedure 
in which Pa is used, may be considered as special cases of the “sometimes pool” 
procedure m which Pi takes on its extreme values, 1 and 0 respectively. In 
practice, the probability levels Pa and Pa are usually the same; in the present 
study they are allowed to be different in case this greater flexibility should prove 
desirable. The objects of the investigation are: (a) to examine the Type I error 
under the above rule of procedure, i.e., to determine the frequency of rejecting 
the null hypothesis when it is true; and (b) to examine the behaviour of the power 
with particular reference to comparisons with the power of the “never pool” 
procedure. 

The remainder of this paper is divided into four sections: Part II contains a 



PBEUMINAEY TEST 


541 


general diKcussion of ihe results, conclusions and recommendations; and Part III 
illustrates the general conclusions with numerical examples The derivation of 
distributions, proofs by elementary arguments of general qualitative results, and 
derivations of closed form expressions for = 2, are given in Part IV. 

II Geneeal Discussion oe Results, Conclusions and Recommendations 

2.1. Criterion employed. In this part the principal results and recommenda¬ 
tions are discussed for the reader who is not interested in the mathematical 
details. To give results in a simple form is not easy, because of the many variables 
—the F’s, the ff’a, and the n’e —that enter into the problem. It may be helpful 
to consider what is wrong with the “always pool” test, and then to state the 
properties which the preliminary test must have if it is to be regarded as useful 
and successful. 

If the “always pool” procedure is employed when in fact trt is greater than 
(71 , i.e. 0 J 1 > 1, the denominator in the final F test tends to be too small. Thus 
the final F test gives too many sigmficaut results when its null hypothesis is 
true and if dn is great enough, there is no bound to this hidden distortion of the 
significance level. A test which the research worker thinlcs is being made at the 
5% level might actually be at, say, the 47% level. 

The preliminary test represents an attempt to avoid this alarming disturbance, 
since if 0ji is very large the test is expected to warn against poohng Such a 
procedure, however, can not be expected to remove this disturbance completely, 
and it does not do so, but to be successful it should keep the true or effective 
significance level of the final F test close to the nominal level at which the 
research worker thinks he is working. 

A second requirement is that the preliminary test should increase the power in 
the final F test lelative to the power of the “never pool” test When the powers of 
the “sometimes pool” and “never pool” tests are compared, it is important to 
make the comparison ai the same significance level. Suppose the preliminary test 
shifts the significance level of the final F test fronii the 5% to the 6% level—a 
disturbance that for some uses would not be regarded as serious In this event the 
“sometimes pool” test (at the 6% level) would tend to be more powerful than 
the “never pool” test at the 5% level, because an increase in significance level 
generally results in an increase in power. But unless the “sometimes pool’ test 
has more power than a “never pool” test made also at the 6% level, it has no 
real advantage over the “never pool” procedure 

2.2. Effect of preliminary tests made at the 6% level. Probably the most 
common procedure in practice is to perform the preliminary test at the 5% level 
(i.e. Pi = .05) and, whether poohng is prescribed or not, to conduct the final F 
test also at the 5% level, (i.e. Pi = Pi = .06). Such a procedure, except when 
021 is near one and the nuU hypothesis is true, results in the null hypothesis being 
rejected more frequently than if pooling is never resorted to 

When the ratio 02i is equal to one, so that routine pooling would be valid, the 



542 


A. E. PAXJLL 


preliminary tot ia effective. The true Nignifieanee le.vcl of the final F test is 
decreased slightly, but is ahvay.s eonfmed between the, 5% and the 4.75% levels. 
Further, the power is iihvay.s greater than that of the "never pool” test made 
at the same Bignificanee level. 

As 031 increases from 1, the, tine significance level of the final F test increases 
to a maximum and then slowly decreases to 5%. Unfortunately the maximum 
need not bo near to 5%: in the example prescntral later it is about 16%, and for a 
broad range of values of 02i the true significance level is higher than 10%. Com¬ 
parison with the power of the "never pool” te.st is also unfavorable to the "some- 
times pool” test. For values of dn near 1, the "sometimes pool” test has the 
higher power, but as 6n becomes larger the advantage passes to the "never 
pool" test. 

When Bn is very large there is, os would be expected, little disturbance. The 
preliminary teat seldom prescribes pooling, so that the properties of the "some¬ 
times pool” test are very similar to those of the "never pool” test, although the 
“never pool” procedure yields the slightly higher power. 

The main objection to the use of the "sometimes pool” test is as-sociated with 
the intermediate values of On . If over a series of expendments On has a moderate 
value greater than one, the "sometimes pool” test at the 5% levels yields more 
apparently significant results than are anticipated, and is also less powerful 
than a corresponding "never pool” test. The magnitude of those undesirable 
properties can be reduced somewhat by increasing the significance level of the 
preliminary test. 

2.3. Effect of preliminary tests made at the 25% level. Use of the 25% in¬ 
stead of the 5% significance level for the preliminary test reduces, in general, the 
probability of rejecting the hypothesis. This reduction, at intermediate values 
of 021, results in a reduction of the extreme disturbances. When tho ratio 02i is 
equal to one, however, tho effects are not as favourable. If the hypothesis is 
true, still fewer apparently significant results occur. A final test being carried 
out at the 5% level can now have an effective significance level close to 3.75%. 
If the hypothesis is false, the test is still more powerful than a corresponding 
"never pool” test but the gain is not as great as when a preliminary test at the 
6% level is employed. Since most experimenters desire a reasonable amount of 
protection against an error in judgement of the true value of On > the reduction 
in disturbances for intermediate values of 02i, resulting from tho use of tho 25% 
rather than the 5% level, would bo judged to outweigh the disadvantages of the 
compensating factors. 

2.4. Effect of further increases in significance level. Increasing Pi, the sig¬ 
nificance level of the preliminary test, decreases the probability of rejecting 
the hypothesis only to the point where a critical value Pi is reached. Increasing 
Pi beyond this value results in an increase in the probability of rejection, The 
properties of a “sometimes pool” test in which Pi is less than Pi differ, in general, 
from those of a test in which Pi is greater than Pi. 



PEELIMINABY TEST 


543 


Tests of the former type, which a,re referred to here as Class A tests, are the 
tests commoirly encountered in practice Considering, for example, a test in 
which Pa = Fa = .05 and ni == 20, = 4, wa = 2, we find the critical value Pi 

to be .77, a figure much larger than the values .05 or .25 customarily chosen 
for Pi. The naajor portion of the present discussion deals with Class A tests. 
Teats in which Pi i.s greater than Pi are referred to as Class B tests and discussion 
of their properties is relegated to a later section. An expression for evaluating 
Pi i.s given in Subsection 4. 3. 


2.6. Effect of Pi, Ps. The probability levels (Pi , Pa) used for the final test 
determine the properties of the "sometimes pool” test for extreme values of On . 
Wiien 021 IS equal to one, the effective significance level is less than the nominal 
value Pa, but is not less than (1 - P^Pa. The power of such a test is greater 
than the power of a corresponding "never pool” test, but less than the power of a 
test in which one always pools and uses the Pa level. For very large values of 02i 
the behavior of the "sometimes pool” test approaches, in all respects, the 
behaviour of a “never pool” test at the Pi level. 


2.6. Effect of na, Bi. The degrees of freedom Ui and ni, associated with the 
mean squares that are sometimes pooled, clearly affect the magnitude of the 
disturbance. Because analytic investigation becomes complex, the following 
remarks are based on conjectures arising out of examination of a number of 
numerical examples. 

A large value of rii is desirable m two respects. As Ui becomes larger the 
preliminary test becomes more powerful and pooling is prescribed less often. In 
addition, when pooling is prescribed the pooled mean square is further weighted 
-in favour of the valid error 4 . Both factors are contributing towards a decrease 
in bias of the error mean square with a consequent reduction in the disturbance 
introduced into the final test. 

The effect of rii is not as simple. As nt becomes larger the preliminary test 
again becomes more powerful and pooling is prescribed less often. But when 
pooling is prescribed, the pooled mean square in this case is further weighte in 
favour of 4, which la smaller than the valid error . The effect on the final 
test, which is due to a combination of these two factors, clearly depends on the 
value of 021. For intermediate values of 02i the latter factor is the predominant 
one, and the disturbance of the effective significance level is increased as ni is 

increased. 

2 7 Class B Test. A Class B test is one in which the probability level (Pi) 
of the preliminary test is greater than a critical value Pi. f 

only when the mean square lu is relatively large, ivith the residt 
mein square tends to be too large. Accordingly, a B somet^es ^oo 

test rejects the hypothesis less frequently than a never pool 

The effective significance level of a Class B test is less than In for all vataes 
of 021. It has its lowest value when 02i is equal to one, and approaches P, 2 , 



544 


A. K. PAt’LL 


becomes very Iarg(*. Hecause pooling is prescriliecl infreciiiciitly, little power is 
gained by the use of a Class R test rather tlian a “never pool” test. 

2.8. Recommendations. The principal conchnsions discussed in the preceding 
subsections may be summarized as follow.s: A preliminary <('.st carried out at a 
significance level as low as 5% affords little protection against errors in judge¬ 
ment. If <ri is equal to iri(6n “ 1) the reduction in errors of inference is appre¬ 
ciable; but if, in fact, o-i is less than > 1), a greater number of incorrect 
inferences are made than if a preliminary teat is not employed at all. The use 
of the 25% significance level for the preliminary test introduces the same dis¬ 
turbances but to a lessor extent. Extreme increases in the effective significance 
level at possible values of ffn are reduced and losses in power at these values are 
not as serious. The 25% level provides a reasonalile amount of protection against 
an error in judgement regarding the true value of 0^ . However, when wj is 
large relative to ih , a smaller significance level could be employed without 
introducing any serious disturbances at the intermediate value.s of 0ii, and 
with a resulting gain in power at values of On near one. 

The folloiving method of performing a preliminary test is recommended as one 
which tends to stabilize the disturbances at intermediate valuo.s of On while still 
taking advantage of a considerable portion of the possible gain in power at 
values of On near one. The procedure consists of pooling the two mean squares 
Vi and vi only if their ratio is less than 2 Fw, where Fm is the 50 per cent point 
of the T-distribution for n* and Ui degrees of fi'ccdom. The use of the multiple 2 
is arbitrary and a smaller value may be used if the experimenter desires additional 
control over extreme disturbances. 

This procedure has the advantage of admitting less disturbance over a larger 
range of values of rh and ni. The customary method prescribes pooling if the null 
hypothesis {On = 1) of the preliminary test is not rejected at some preassigned 
probability level Pi. If enough observations are available to provide reliable 
values for 112 and Vt , pooling is prescribed only if <72 and ci ai'e essentially the same. 
However, if small numbers of degrees of freedom are involved, the preliminary 
test is too weak to reject the hypothesis even if cri is appreciably less than al, 
and pooling will be prescribed too frequently. On the other hand, the use of the 
recommended procedure has the effect of prescribing pooling only when it can 
be said, with confidence exceeding 50%, tliat the true value of On is less than 
some chosen value such as 2. 

This can be demonstrated simply by considering a series of experiments 
in which preliminary tests are performed. When vt/vi < 2Fk , we make the 
statement 

(1) ^21 < 2, 

and when Vi/vi > 2Pjo, we make the statement 

( 2 ) On > 2 . 



PKELIMIlsrAEY TEST 


545 


We have 


or 


Pr^J . 1 > 

ih 621 



= .50, 


^ i^60«2i| = .50. 

If Biatemcnt (1) is true, 

Pt < 2^4 > .50; 

and if statement (2) is true, 

Pr > 2no| > 50. 

Thus, no matter what the true value of 6 ^, the statements are true more 
than 50% of the time. 

Fifty per cent points of the F-distribution have been tabulated by Mernngton 
and Thompson [3]. 

A simpler rule, and one which is nearly equivalent when the degrees of freedom 
involved are each greater than 6, is to pool if the ratio of the mean squares is less 
than 2, without any reference to the F-table. For smaller numbers of degrees of 
freedom, however, this simpler rule does not embody the advantages of the 
2^60 rule, unless of course, ni and are equal. 


III. Numebical Illtjsteations 


3.1. Effect of Pi illustrated. An example of the influence of Pi on the effective 
significance level or Type I error of a "sometimes pool" test is illustrated in 
Figure 1. When Pi = 0, the Type I error has its maximum value equivalent to 
the Type I error of an “always pool” test at the P 3 level. As Pi increases from 
zero, the Type I error decreases until at Pi = Pi(.77 in this case) it reaches its 
minimum value at a level less than P 2 . As Pi increases from Pi, the Type I 
error increases until, at Pi = 1, the Type I error is equal to Pa 
The influence of Pi on the power of a “sometimes pool” test is illustrated in 
Figure 2. The gain in power, as a function of 621 , is presented for three Class A 
tests. Since comparisons of power are made over tests having different Type I 
errors, the gain is expressed as the proportion actually attained of the total 
gain in power that is possible if the true value of 02 i is actually known. When 
Pj = Pj = .77, the curve IS observed to decrease monotonically to zero. However, 
for lower values of Pi, the preliminary test prescribes pooling more often, and 
more power is gained when ^21 is near one but less power is gained or power is 
actually lost when ^21 is large 



640 


A. K. PAULti 


The power gained or lost at various values fif O^i is illustrated in Table I 
The probability of rejeeting the hypothesis for the “sometimes pool” test is 




Fig. 1 . Effect of Varying Pi. m « 20, n, - 4, n, - 2 and P, - P, .06. (a) Upper 
diagram: Class A Teats (b) Lower diagram. Class B Tests. 



Fig, 2 . Proportion of Possible Gain in Power Actually Attained. ni •» 20, nj = 4, ns = 2, 
“a P 3 = .05, 

tabulated opposite “s p.”, and for the “never pool” test having the same Type I 
error opposite “n p.”. 



PBEUMWARY TEST 


547 


The last line of the table approaches the probabilities for a “never pool” test 
having a Type I error of 5%. Except for values very near (du, On) = (1, 1), the 
probability of rejecting the null hypothesis, using a “sometimes pool” test, is 
greater than if a “never pool” test, at the 5% level is used. In this sense, the 

TABLE I 

Comparison of Power of a. “Sometimes Pool" (s p.) Test and Corresponding “Never Pool" 

(n.p.) Tests 




m = 

20, m ■■ 

“ 4, na 

= 2:P, 

= Pa = 

Pi = .05 




Value of 

Teat 

1 




Value 1 

of Sil 




Oil 

Am - 1 

1 8 

28 

4.3 

7,1 

12,5 

25 

SO 

750 

1,0 

B.p. 

,048 

164 

.299 

443 

599 

739 

,865 

922 

.984 


n.p. 

.048 

.112 

.192 

.297 

.441 

604 

765 

,870 

.972 

1 2 

a p. 

.067 

200 

.338 

476 

621 

751 

,860 

925 

.984 


n.p. 

067 

.149 

246 

.361 

.508 

662 

805 

895 

978 

1 6 

8 p. 

102 

248 

379 

.503 

,632 

,750 

.856 

.921 

.983 


n.p 

102 

.210 

323 

447 

.592 

730 

849 

920 

.983 

2 0 

B.p. 

.127 

.271 

390 

,500 

,619 

736 

845 

916 

.981 


n.p. 

.127 

.250 

.370 

.497 

.636 

,764 

.870 

932 

.986 

2.6 

a p. 

146 

.278 

.382 

482 

596 

716 

.831 

907 

.976 


n.p 

.146 

.278 

,402 

628 

,664 

784 

.882 

938 

.987 

4 6 

a.p. 

148 

.233 

.309 

399 

.520 

.657 

796 

.887 

.976 


n.p. 

,148 

280 

,405 

531 

.666 

786 

883 

939 

.987 

7,0 

B.p. 

.117 

.182 

,265 

350 

.482 

632 

.781 

880 

.974 


n.p. 

.117 

.234 

362 

.478 

.620 

,751 

862 

.927 

985 

10 

B.p. 

.091 

.162 

.227 

327 

466 

621 

.776 

.877 

.074 


n p. 

.091 

.191 

.300 

422 

669 

712 

.838 

.913 

.982 

16 

B.p 

.067 

.130 

209 

313 

456 

615 

773 

,875 

.973 


n.p. 

.067 

,149 

.246 

361 

509 

662 

.805 

,895 

.978 

100 

B.p. 

,061 

117 

.200 

.307 

,452 

.613 

.771 

.875 

.973 


n.p. 

.051 

.118 

,201 

.308 

454 

615 

773 

875 

973 


Below the heavy line the s p teat ia leas powerful then the n p. teat. 


“power” of the “sometimes pool” test is greater everywhere except near 

(^21 , Osi) = ( 1 , 1 ). 

3.2. Effect of Ps, Pz illustrated. The influence of the probability levels em¬ 
ployed in the final phase of a “sometimes pool” test is illustrated in Figure 3. 
The main effect is observed to be the manner in which the behaviour is con¬ 
strained at the extreme values of . 



548 


K. E. PAULE 



Via 3 Cluaa A Tesla, tii =■ 20, n, — -1, n, =» 2. 



Fia. 4. (a) Upper Diagram; Effect of Varying nj. Pi <= p* »» P, »« ,06 and n.i = 20, 
ni = 2. (b) Lower Diagram: Effect of Varying ni Pi w pj => p, « ,06 and ni = 4, nj = 2. 

3.3. Effect of nj, tii illustrated. The response of the Type I error to increases 
in the degrees of freedom of the preliminary test is illustrated in Figure 4. The 
maximum disturbance is observed to increase as rij increases or as Th decreases. 





PEELIMIKA.EY TEST 


549 


3.4. Class B test illustrated. The behaviour of the Type I error of some Class 
B tests IS illustrated in. Figure 1(b) The hypothesis is always rejected less 
frequently than if a “never pool” test at the Fj level is used 




Fiq 6 , (a) Upper Diagram- Effect of Varying n? when Fi = 2 F 30 , Ps - Fj = 05 and 
m = 20, 713 = 2, (b) Lower Diagram, Effect of Varying m whenP 1 = 2Psii, Pj = Pj = 05 
and 712 = 4, na = 2. 



Fig. 6 Effect of Varying m when Ps > Pj P 2 = .10, Ps = .05 and nj 4, nj 2, 


3.B. Recommended procedure iUustrated. Figure 5 illustrates the behaviour 
of the Type I error when the recommended procedure is applied to the special 
in Fiirure 4. Whenn. - 12. = 4, the 20% probability level is 



A. B. PAt’1,1, 




lii(‘s(Tilu*il iiiid llu‘ Type I ernir never eveeed.s .09. When ih ~ 20, /ij = 20, the 
iiion' lilieral viilne of 0% i« piwerihed ;ind the re.siiHinf? 'hypo I error never 
exei'edH .07. The more hl)eral elioiee of I\ restdfs in a (treater Rain of power, 
near 821 — 1, tlmn would liavt' re,suited if the 20% level had been u.sed throughout. 
A small I0K.S in ptnver oeeurs when On i.s huge Bhould the experimenter wish to 
guard against thi.s loss in jiower for a larger range of value.s of O 21 near one, he 
may do so, at the e\i)r;n.se of a Kf)mewhat larger di.slurbanee, in the Typo I error, 
by choosing P 2 larger tlian . In tlu; f)re.sent exami)Ie, if /T is taken as .10 
imstead of .00, Figure 0 shows that the 'I'ype I error i.s ehangerl only .slightly for 
values of Oil near one, hut the maximum disturhanei* i.s increased. Such a test, is 
uniformly more powerful than the “never pool” lest for all value.s of On for which 
the Type I error is le.ss than .10; a much larger range of value.s than in the 
previous case. 

IV. Deuivatiom.s and Pnooi>’s 


4.1. Derivation of joint fretiuency function. The, joint frequency function of 
the y's is given by 


exp(- + 

I L 


»2 l'2 , 

Cj 0-3 


whore Ci is independent of the u’s. Transform to new variables: 

riiVi 713 V} niVi 

= ' ■ ■, «J = -, XO . 

Ill Vi 712 V2 713 

By integrating and evaluating the constant, the joint frequency function of 
111 and itj is oljtainccl; 


1 «i fli (rtj 


Ojr* ell 


ICnj+nj)—1 inj~l 
Ul H 2 


( 3 ) P =____ „ ___ 

Kwi + rij)) {Oxidsx + 03 s Wi + UtUaV 

where O 21 = vs/cri ; 032 = . 

4.2, Definition of critical region. The rule of procedure for the "sometimes 
pool” test may now be expressed in terms of the it’s. Reject the hypothesis 
032 — 1 if 


ki > ul, 
Wi ^ 112 , 


or 


|wi < Wl, 

U1U2 ^ 0 

^ Ua, 


[1 + 1ii 


22 ,; = ■Fi(7i2,ni-,Pi), 

W2 = - ■F2i7l3 , 712 ] P2), 

712 

0 ^3 - 

, ^2 + nx; Pa). 


where 



PRBLIMINA.ET TEST 


551 


The reader will note that the u’s are ratios of sums of squares. The symbol 
is associated with the preliminary test The final test when pooling is not pre¬ 
scribed is a.s,sociated with the symbol iij, and when pooling is prescribed the 
relevant statistic i.s + iq). 

Ihc (‘litical region defined in this way is illustrated m the two dimensional 
sample space (wi, w.j of Figure 7(a), The critical regions of the “never pool” 
and the “always pool" tost are readily identified in this figure. The region of a 
"never pool" te.st at tlio l\ level is designated hy A+ B, + C, the area above 
the line Uj = ; and the region of an “always pool” test at the P 3 level is 

designated hy Ih + Pa + C -f 7), the area above the curve uiu^ = u]{l ui). 
The critical region of the “sometimes pool” test, + C, may be considered 

in two parts; the portion due to pooling, Bi B 2 , and the portion due to not 
pooling, C. 




U,-- 

Fio. 7. Critical Rogion of “Sometimes Pool” Test, (a) Left; Class A Test’ uj > fii (b) 
Ilight: Class B Test; < iTi . 


The probability of rejecting the null hypothesis is given by 


(4) 


QCflai, Oil) 



p dui dui -f 


[ f pduidui. 


where p is the frequency function (3), and w = « 3(1 -\- mi)/wi . 

Simple explicit expressions for these integrals cannot be obtained in general, 
but when ni = 2 they can be reduced to forms containing incomplete beta 
functions. Tins special case is dealt with in Subsection 4.7. 


4.3. Critical value of Pi. The symbol mj in Figure 1 is used to denote the 
ui coordinate of the point of intersection of the line mz = ul and the curve 
UiUi = u?(l -h Ul), Accordingly, 


a value readily determined for any given test This relationship may be expressed 
in terms of the P’s as 


( 6 ) 


Fi = 


Hi 

til Ps 






A. K. PAI IJ, 


A\lu’r(‘ f'\ ih flffiiu'il by tiiUi — ikPi . TIh! jinibululily level coimsponding to 
is (lerirdi'd liy f\ . 

Tlio critical viiluc Pi i.s the value of I\ which divule.s the posHible “Hometinic's 
peol” tests iiitu two types having flilTerent profn-rl iee If Pi is less than Pi{Fi > F, 
or«i > iJ|), tlie lest isreferred tnusaCliuss Atest If d'l is greater than jPi(Fi <Pi 
or < Cl), the test i.s refei-red tc» as a ClasH H lest. 

4.4. Lemma 1. 

I-EMM\ 1. If O'n > Ca and dh > dn , ami if llw ftpinlilij applies in one of these, 
then the ratio of ike frequency functions (H) 

p(ni, Wj 16ii, fljs) 

pUn I ws 1 0 : 1 , 9n) 

increases monolonicalhj «,s (i) Ui increases with ih fuccl, or as (ii) un increases 
V)ith III fired, or as (iii’l u, increases on fixed pooling curve \i\Ui = !(S (1 + uf). 
I’uooF, The ratio (7) is a nionolonic function of 

QuOm -f 0 ’,iU\ + uiUi 
Oiidu + BisUi + MiUj 

It is easily shown that an cxprc.s.sion of the form (a + hx)/{c + dr) increases 
monolonically with respect to .r if a/c < h/d, and tins condition ludds for cases 
(i), (ii), and (iii). 

4.6. Lemma 2. 

Lemm.^ 2. If area L lies above a given pooling curve, ami to the right of a given 
preliminary line, if area K lies hehxo the same pooling curve, and to the left of the 
same preliminary line, and if 

Pr{L\en,en\ > Pr{K \ On , 0,t], 

then 

Pr{L\en,eu} > PrIKlO'ii.d'i,], 

U'here d;i > On ami o'n > Cjj and the equality applies in one of these. 

Proof, Foi' any point (ui, Uj) in K and any point (nj, nj) in L, Lemma 1 
(iii) yields 

p{ui,Ui\dii ,en) ^ p(,u [ , Ua i Oai, On) 
p(«i, uj 1 Osi, Usa) p(id, ns | ^21 , On) ’ 

where ih = c(l + ui)/ui, and c is a constant defined by Wa = c(l + uD/ui, 
Since K is below a given pooling curve, ih < u'i and 

pjui, U 2 I ^21, 632 ) ^ pjui , U 2 I On , ^3s) 
p(u'i, Uj I O 21 , Bn) p{u'i , u'i 1 021 , 032 ) ’ 



PnBLIMINAllY TEST 


553 


Consider 


< 6 < I Sn,di2) 

piui, U2 1 021 , 032 ) p(lil J u'i 1 021, 032) ’ 

whore b is a constant such that the inequalities hold for all {u,, u^) in K and 
all (ui, Hi) in L. 

Integrating over the regions yields 


Pr{K I 02'i, 032i < b.PriK | 02i , 032} 
and 


But 


6.Pr{Ll02i, 032} < Pr{L|0;i,0;2l. 


Pr{K 1 021 , 032) < Pr{L | 02i , 032], 

thus 

I’r(ir|0;i,0',2l < Pr(Ll02i,0;2l, 

which completes the proof. 


4.6. General Properties. 

Result 1. When 02 i = 1, ihe Type I error of a Class A test is less than P 3 . 

Pnoor, In the notation of Fig. 7(a), the probability of falling in B 1 + B 2 + 
C + B is Fa when 02 i = 1 and @32 = 1. The region of rejection of the “sometimes 
pool” test is smaller by D. 

Result 2 . When 02 i = 1, the Type I error of a Class A test is greater than 
(1 - Pl)Pz . 

Proof. The statistics ui and 1 * 1112/(1 + Ui) are independent when 02 i = 1 and 
032 = 1. Under these conditions, the probability of falling in Bi + B 2 , in the 
notation of Fig. 7(a), is equal to the product of two incomplete beta functions 
having the values (1 — Fi) and P 3 . Consequently, the Type I error is greater 
than (1 — FOFa. 

Result 3. The Type I error approaches P 2 as 6^ approaches infinity. 

Proof. The distribution becomes singular when 02 i = ». The frequency 
function approaches zero uniformly for any finite value of 1*1 and approaches 

1 1*2*”’"' 

B(in 3 , J712) (1 + 1*2)*'"’+"’’ 

at 1*1 = «. When 02 i = », the entire mass is concentrated on the line wi = “ 
and is distributed as a beta variable along that line. In the notation of Fig. 
7(a), Fr(Bi + B 2 ) 0 and Pr{C) Pi. 

Result 4. If the Type I error of a Class A test is Qo for dn , then for > 02 i, 
Ihe Type I error is greater than r, where r is equal to the lesser of Qo and Pi. 



554 


A. B. PAULI, 


Tlirfie useful (‘orolkries are tusKOciafprl with Ou; above n‘.sult: 

RffiuLT 4.1. If ul 6ii - 1, Ihc value of the Type I error is less than , this is 
its minimum value for any On • 

Result 4.2. If at On — 1 , the Type I error is less than Pj, then as O^i increases 
from 1 the Type I error increases monotonicaUy until Pi is reached. 

Result 4.3. If for some value of On the Type I error is equal to or greater than 
Pi , then for any larger value, of On , the Type I error is greater than Pi, 

Proof. Let the region.s of 8 be denoted by III » Ai 4- jBi + Ci with 
similar designations for Zij and lii. lAJt Ii4 = Pi + Pi + Ih + Pi + Cii- Ci. 

Ifr — Qo, let the non-pooling line between Iti and Ih in Pig. 8 correspond to 
Qd for all On . Then Pr{Ri j On, 1} ■= Pr{Ri | On , 1), whence Pr{Bi + Ba-|- 
+ Cj 1 flsi, 11 = Prldi I flsi, 1}. By Lemma 2, we have for any o'n > On, 
Pr{Bi + Bi + Pi + Ci\ On, l\ > Pr{Ai \ On , 1] and Pr{P^ | On , 1) > 

Pr{Pi|fln.ll = Qd. 



Fui, 8. Critical ItogioriB for Ileault 4. 

If r = Pj, let the non-pooling lino at the loiver boundary of Ri in Fig. 8 
correspond to Qo for all On ■ Then in the same way Pr[Bi | 621 , 1} = Pr{Ai + 
d .2 Aj Cj I 021 , 1 ) and Pr{Bi | On, 1) > Pr{A\ At Ai\ On , 1) by 
Lemma 2 . Thus Pr(Pi | 0 ji, 1 ) > Pr{Ri + Ri + -'Ij + Bi\0n, 1) and 
Pr{Ri [ 021 , 1) > Pr{Ri -j- Ri \ 02 i, 1 } = Pi . 

Result 5. For a Class B test, the Type I error is less than Pi for all On . 

Proof.' Figure 7(b) illustrates the critical region of a Class B test. We have 
Pr {A + P -|- (7i 4 - C 2 -f Ca) = Ps. But the region of rejection of the "sometimes 
pool” test is smaller, excluding A. 

Result 6. The Type I error of a Class B test, for On => 1, is greater than 
(1 - Pi)Pi . 

Proof. Changing Pi to Pi removes C 2 from the region of rejection in Fig. 
7(b), thus decreasing the T5T3e I error. The modified test lies in both Class B 
and Class A, so that Result 2 applies. 

Result 7. For any Oh , the Type I error is a minimum for changes of Pi when 
Pi = Pi. 



PRELIMINARY TEST 


555 


Proof. For a Cltias A test, changing Pi to Pi removes region of Fig 7(a), 
thus decreasing the Type I error. For a Class B test, changing Pi to jPi removes 
region Cs of Fig. 7(b), similarly decreasing the Type I error. 

Result 8 . A Class A test, in which the Type I error is less than or equal to Pj, is 
more powerful than a “never pool” lest hewing the same Type I error. 

Proof. In Fig, 8, let region Pi = Ai + Bi + Oi be equal in size to Ri = 
Bi + Ba + Bs + Bi + Cl + Cl. Then Pr\Ri ] flai, 1} = Pr{Bi \ dn, 1} and 
PrjBa "h Bz -{- Bi Cj | On , 1) = Pr[Ai j 02 i, 1) Increasing 632 = 1 to 632 and 
applying Lemma 2 yields PrlBt, 1 On , On] > P?-{Pi j ^ 21 , ^ 32 }. 

Result 9. For a fixed Type I error a Class A test, carried out at given levels of 
Pi and Pz, is more powerful than a Class B test at the same levels. 

Proof. Fig. 7 and Lemma 2 apply at once 


4 . 7 . Closed form expressions for nz = The probability of rejecting the 
hypothesis in a "sometimes pool” test is given by Q(^ 2 i, ^ 32 ) = Qi + Q 2 where 
<2i corresponds to the region B, and Qi to the region C of Fig. 7. 

The integrals (4) representing the probability of rejecting the null hypothesis, 
reduce, when Uz = 2, to 


( 8 ) 


Qi = 


1 + 

ft 

Uz 

6 „i 

1 A- 


^ ^21^82] 


Ini 


hj^nz, ^ni) 

i + ? 


where the argument z of the incomplete beta function is defined by z a:/(l + x) 
where 


(9) 


X = 


i+'J 


1 + 


0 

Uz 


dll Ozi 


Under the null hypothesis 632 = 1, 
(10) Qi =/»(^W2. W 


■< I V 

1 + W 3 


, ^ 

^21 


i "1 


• P 


3j 


since 


1 

(1 + ■ 


Similarly 






A. K. PAttU, 


where Lhe arKimietii 2' of (he inrottiplete lK*ta fiini-lioii ir< defiiK-d tiy ^ = l/fl-f-x') 




( 13 ) (h-U%An^'Pu 

since 


1 

The iiicotnplole behi fiinclion Iniabeen tahiilitled by IVarsoti [4]. 

The author wishes to thank Prtjfessor W. (}. Cu(!lirnn and Professor John!, 
Tukey for helpful advice in the preparation of this paper, 


(1) T, A, Banciuiw, "Oil l)iaai)« in cstimatitin due In die use of preliminary tests of sig* 
iiilicaiiee", Annafs 0 / Math. StaL, Vol IS (Jil’M), pp. 1!I0 201. 

[2| FnKDBiiirK Mostei,i.!{S, "On poolinp; data". Jour, Ahi, iS’/af A«sa., Vol, 43 (1048), 
pp,231'242, 

[31M, Mf)nittNciTON AW) C. M. Tiiomcson. "Tnhles of iH'reciitage jioiiits of the inverted 
beta (F) distribution". Biomlrik, Vol. 33 (1013), pp, 734i8. 

[41 KAftp Pearson, Tahiti of ik Inconipklt Bela Ftinclm, OanibriilKe University Press, 

1934. 



estimating the mean and variance of normal populations 
from singly truncated and doubly truncated samples^ 

By a. C. Cohen, Je. 

7'he University of Georgia 

1. Summary. This paper is concerned with the problem of estimating the 
mean and variance of nomnal populations from singly and doubly truncated 
samples having known truncation points. Maximum likelihood estimating equa¬ 
tions are derived which, with the aid of standard tables of areas and ordinates 
of the normal frequency function, can be readily solved by simple iterative 
processes. Asymptotic variances and covariances of these estimates are ob¬ 
tained from the infonnation matrices. Numerical examples are given which 
illustrate the practical application of these results. In Sections 3 to 8 inclusive, 
the following cases of doubly truncated samples are considered: I, number of 
unmeasured observations unknown; II, number of unmeasured observations in 
each ‘tail’ known; and III^ total number of unmeasured observations known, 
but not the number in each 'tail’. In Section 9, singly truncated samples are 
treated as special cases of I and II above. 

2. Introduction. In practice, truncated samples arise with various types of 
experimental data in which recorded measurements are available over only a 
partial range of the variable Such samples are usually classified according to 
the form of the population (complete) distribution; according to whether the 
truncation points arc known or unknown; and according to whether the number 
of unmeasured (missing) observations is known or unknown In this paper, the 
further classification of singly truncated or doubly truncated is made, accordingly 
as one or both 'tails’ of the sample have been lemoved. Pearson and Lee [1, 2], 
Fisher [3], Hald [4]®, and this writer [5] studied singly truncated normal samples 
with a known truncation point when the number of unmeasured observations is 
unknown, Stevens [6], Cochran [7], and Hald [4] studied similar samples with a 
known number of unmeasured observations. Stevens [6] also considered doubly 
truncated normal samples with known truncation points when the number of 
unmeasured observations in each 'tail’ is known. In each of these papers, equa¬ 
tions were derived with which maximum likelihood estimates of the population 
mean and variance can be computed from samples of the type considered. 
With tho exception of [6], which uses standard tables of the normal frequency 

' Baser! on papers presonled before the American Mathematical Society, Durham, 
North Carolina, April 2,1940, and before a joint meeting of the Institute of Mathematical 
Statistics and the Biometric Society, Chapel Hill, North Carolina, March 18, 1950 

’ The problem involved in this case was recently called to the writer s attention by 
Churchill Eisenhart 

“ Reference [4] appeared while this paper was awaiting publication. Minor revisions have 
been made in view of Hald’s results. 


667 



558 


A. C. COllKK, JU. 


function, practical application of the various cstimiitiog etpiations involves 
use of special tables which may frequently he unavailable. 


3. Case L Number of unmeasured observations unknown. Let xo designate 
the left truncation point, Xo + If the right truncation point, and lienee/f the sam¬ 
ple range. Let ?ro be the number of meiumred iib.servutiuns with value.s equal to 
or between the truncation points. In this case, the munber of unmeMured obser¬ 
vations is assumed to lie unknown. We tnuihlate the origin to the left terminus 
^ by the change of variable x' — Jo, anti dcaignate Ihe left and right truncation 
' points in standard unils of the population (complete di.strilmtion) as and f", 
respectively. Wo can write the probability dcnsily function for this case as 


( 1 ) 

where 

( 2 ) 

and 


_i_ 

" (/; - /'oVV2;r " 


0 <x<R, 




(3) M - rj - vf'. 

Thus (Jo - I'o) is the area under the normal curve between ordinates erected at 
S' and^" respectively. Moreove.r (/J — l”) = P(xo <x'<xo -h li)- The likelihood 
function for such a sample is 

(4) P(xi ,X2 ,--- , xj = ((/'■_ 7o'VV2t) 


Since R is the truncated range, and since and f" arc in standard units, 
we have 


( 6 ) 


= {' + R/x. 


It should be understood that is considered throughout this paper, as the 
independent parameter of location. The mean, /x, cf. (3), is a linear function of 
In the derivations which follow, we employ the Fisher !„ functions, where 
Io(?) is defined by (2) and 


( 6 ) 

and hence 



ln—l(0 dl, 


din 


-I, 


n—1 * 


These functions satisfy the recurrence formula 


(7) (n -1- l)7„+i -f- - Z„_i = 0, 


n > —1. 



TRUNCATED SAMPLES 


559 


IS) is ordinarily abbieviated to in this paper. Where no confusion seems 
likely to occur, similar abbreviations are used for other functions of 
We now obtain certain relations for use in subsequent derivations. Equations 
(2), (5), and (6) enable us to write 


( 8 ) 


dl'd 


- -lU - -»({'), I?' - -j". . - „«") 


da 


= -J" C 
^ da 


where v’tt) is the ordinate of thenormal frequency curve, i.e., S) = — 

V 27r 

Ordinarily we abbreviate S') to ip' and S‘') to <p". On differentiating (5) 
we have 


(9) 

and hence from (8) 


= - ^ 

da (T^ 

dl a n R 


Taking logarithms of (4), differentiatmg with the aid of (8) and (9), and 
equating to aero, wc obtain the maximum likelihood estimating equations 

dL 7lo(<p' — (p") 

di ' “ 
djj 
da 

If we define 


( 10 ) 


K - To 




/ nap" \ R 


no ,1 ^ 
+ 2 

a a 1 


= 0, 


0 . 


( 11 ) 


Zi = 


To - To '■ 


Zi = 


l'o~to 


and substitute these values in (10), the estimating equations become 

alZi — Zi — — ri = 0, 

( 12 ) 

a'[l - ^'(Zi - Z^- k') ~ ZiRla) - rj = 0, 
where Pi and vo are the first and second sample moments referred to the left 
terminus; i.e., vk = S Xi/no . 

To obtain the required estimates a and it is necessary to solve the two 
equations of (12) simultaneously. As illustrated in Section 7, this can he accom¬ 
plished without too much difficulty with the aid of the normal curve tables by 
using a modified Newton-Raphson method for solving two equations in two 
unknowns. This method is described in greater detail by Whittaker and Robinson 
[8]. Note that Zi and Z, , cf. (11), involve only the normal curve ordinates 
p' and p" and the areas li and Id . Consequently they can be evaluated for any 



5H0 


A. f. ann:.v, ju. 


(ItMiierl values of and c from htaiidard tables of llie normal freciuency fimetion, 
To determine /u, suhstitnfe o- and (' m (3). 

Throiight this paper, we designate rim maxiinum likelihood estimates hh 
/I, 0 - and respccilively, whereas eoi-re-siiondinu; jHipulation i)araTneteis are, 
designated as p, cr, and 


4, Case II, Humber of unmeasured observations in each ‘tail’ known. f,ei, 
the trunention points, the origin of refereneo, and the number of inea.siired 
observations be designatctl a.s for (inse I. If wo let Ui and ng be the number of 
unmeasured observations in the left and right ‘tail.s’ irspecl ively, the likelihood 
function for a sample of this type i.s 


(13) P(x 


l . •'C2 , 




) = /v(i - Kr 


c- 


V 


riQ no 

f 1 


(li'r 


where K I.s a eonslant. 

We take the logarithms of (13), diffcrcnlialc wilh tlie help of (8) and (h), nnd 
equate to zero to obtain the maximum likelihood c.stimating cciuations 


(U) 


Let 



(15) 


nod-li)’ 




H 

?li v> 

t" 1 

/iu h 


and (14) can be written as 


(16) 


<r[yi - yg - ^'l - eg = 0, 
v“[l - r(yi - Fa - $') - YJt/,7\ - .g = 0, 


where vg and are again the first and second sample moments referred to the 
left terminus. The estimating equations (10) correspond to equations (12) 
given for Case I, and the manner of solution is the same for both cases. Fi and 
Fg for a given sample are functions of f' and a only. They can be evaluated for 
any desired values of these variables from ordinary normal curve tables. As in 
Case I, the mean is estimated from (3). 


6. Case HI. Total number of unmeasured observations known, but not the 
number in each tail. Again, let the truncation points, the origin of reference, 
and the number of measured observations be designated as in the two previous 
cases. Let N be the total sample size and hence W — no the combined number of 



TRUNCATED SAMPLES 


661 


unmeasured observations in both tails. In the notation of Case II, N — no = 
m + ns. The likelihood function for a sample of this type is 

(17) P(xu xo,--- ,x„) =Ka- l'o + 

Taking logarithms of (17), differentiating with the assistance of (8) and (9) and 
equating to zero, we obtain the maximum likelihood estimating equations 



In this instance, let 


(19) Qi 




and (18) can be written as 

<t[Qi ~ Qt - ?'] — VI = 0, 

7[1 - r(Qi -Q 2 - {') - Q 2 RM - V2 = 0. 

It will be recognized that equations (20) correspond to (12) and (16) for Cases 
I and II respectively. Since the manner of solving the estimating equations is 
identical in all three cases, it will not be discussed further here For any given 
sample, Qi and Qa are functions of and v only, and they can be evaluated for 
any desired values of these arguments from standard normal curve tables. In 
this case also, the mean is estimated from equation (3). 


6. First approximations. 

Case i. In this case, the following relations will usually provide satisfactory 
first approximations for estimating <r and : 

(21) (Tl = Si , ~ vi/s*, 


where s* is the sample variance, i.e, s* = (vz - vi)- It should be remarked 
that the only penalty involved in beginning with a poor first approximation is 
to increase slightly the number of steps necessary before arriving at a satisfactory 
final approximation by the method of Section 7. ^ j. j 

Cash ii. Since tii and nt are known in this case, it is more expedient to read 
first approximations to and S" directly from standard tables of normal curve 

areas where we set 




562 


A. C. COIIMNT, JH. 


and 


(23) 


/ll “H 71q d” 712 


T" _i_ r - 
U'' 


(5/2 


di. 


With and determined from (22) and (23), we obtain a first approximation 
for estimating <t, from equation (5), which we now write as 

(24) <n - ii/(fr ~ f'l). 


Case hi, For a first approximation in this case, it vvill usually be satisfactory, 
in the absence of contrary information, to assume that the unmeasured observa¬ 
tions are divided equally between the two tails, and then proceed as in Case 11. 


7. Numerical examples. As previously mentioned, a modified Newton- 
Raphson method for solving two equations in tw’o unknowns is satisfactory in 
each of the three cases considered, for solving the estimating equations to obtain 
6 - and in practical applications, A random sample from a normal population 
with n ~ 0, and cr « 1, selected from Mahalanobis’s tables [9] will serve to 
illustrate the solution in each cose. 

Case I. For the sample selected, rr, = 32; — 1.244625; Vi = 2.105276; 

jCo = —1.000000; and R ~ 2.750000. The estimating equations to be solved 
simultaneously for and ? are thus 

<t{Zi - .Z 2 - $'] - 1.244625 = 0, 

Al - ~ Zi- r) - 2,760000 Z^I<t\ - 2.105275 = 0. 

For first approximations, we employ (21) to obtain; o-i = s* =0.75; and = 
— 1.244626/0.75 — —1.66. Beginning with these approximations, we subse¬ 
quently obtain the results displayed in Table 1. 


TABLE 1 

Solvlion 0 / estimating equations in Case I 

ff from t' from Difference 

1.536313 - 0.5389 - 0.5387 - 0.0002 

1,627778 - 0,6455 - 0.5460 4 - 0,0006 


Interpolating in this table, we obtarn v = 1.534 and = —0.541. On substituting 
these values in (3) we obtain ft - —0.170, Even though the first approxirnations 
in this instance proved to be considerably in error, no appreciable increase was 
experienced in the number of steps necessary to arrive at the final values given. 

Case ii. Solution of estimating equations (16) for this case can also be illus¬ 
trated with the same sample which was used in Case I. In this instance, however, 







TUtl.N’f.VTED SAMPLES 


563 


we have the additiimut information; )ii= 7 and m = 1. The equations to be 
solved are; 

(rfFi ~Yi~ J'] - 1,244625 = 0, 

4l - ^\\\ - I'a - ?') - 2.750000 Y,/tr] - 2.105275 = 0. 

From (22). (23 1 and (2!) vvi' obtain the first approximations: = -0,935; 

- 1.00(1; stud hence m « 0.0.50. IkKinning with these values, we proceed as 
in Ca,se I, and afler .^cve-nd trials obtain the results displayed in Table 2. 


TABLE 2 

Sttliiiii'k Ilf cnHiiiaUng equaltona in Case II 

4 

{' fiom 

^ ' (rom 

Difference 

l.lltlliliT 

-0 SCtSl 

-0.9360 

-0 0021 

l.lKKHMI 

-d (1820 

-1.0094 

-f-0.0274 


Interpolating, wo have a - I.O.'IO and{' = -0.941. From (3) we then obtain. 

« - 0 . 022 . 

Case hi. Again wo use tho same sample that was employed to illustrate 
Cases I anil II. In tlii.s instance, however, we assume that the only information 
availtiblf ubnnt tho nnmmwurcd observations is that their total number is 8. 
In the notntinn of Keel ion .5, we have iV = 40, no = 32, and hence N — = 8. 

The tiatiiniiting cqiuUiiitiH in this situation are 

<r[Qi -Qz- n - 1.244625 = 0, 

(r’ll - ” Qi - «') - 2.750000 Qi/<r] - 2 105275 = 0. 

Under the nssniniition that 4 unmeasured observations are m each ‘tml’, equa¬ 
tions (22), (23) and (2-1) give firat approximations: = -1.28; = 1.28; 

and lienee <ri = 1,07.1. Starling with these values and proceeding as in the two 
previous casi's, we obtain the results displayed in Table 3, 


TABLE 3 


ffiiliiCiun ()/ ciiiniaUn g equahoni tn Case III 

(rom ►, (' ffO"' 'i 


t.tKKKKH) 

I.IIHKKK) 


-1.079.1 -1.2091 

-1.0118 -0,9739 


Difference 


-1-0 1297 
-0 0379 


By interjiolation, ive have ^ » 1.077 and ' = 1.027. From equation (3), 

we then compute g « 0.100, 


8. Precision of estimates. To deteimine asymptotic variances ^ ^ 

construct the variance-covariance matrices. This requires are 



564 


A. C. COHEN, JK, 


second partial derivatives of loearithmH nf +v.n in, ru j r 

the three csuses considered. Results stated in tSi n V in each of 

derivatives. ^“volved in these 

Case i. The second partial derivatives in this case are 


(25) ^ ?io/'i(f',f''), 

where 


d^L 

d^'dir 


^ f ft' £"1 ^ fio , / 

“fsit , i ), — = fJt' t "). 


{") - -[1 4- ez, _ f'% _ (z, - Z,)\ 

(26) 4") = {; Z^iiZi - Z^) - f"] + [2, _ 

n = 2,(^5 + f'O - j^2 - j'(2, - _ f') _ 




Subsequently we obtain 
(27) ' 


-/l " 

Tr/iA f 

r - /» 1 

L/i/i - f\\ 

, K (f ) = _ 

no 

: 

M M 
(_ 


Case ii. In this case the second partial derivatives are 
(28) ~ - no gy{^\ f'O, 

where 


V/l/o’ 




i") = - I 1 + i'Fi - {"Tj + -“ K? + fO 

m “J' 

] + [Y, -r,- i'l}. 


(29) ff2(?', n = (-Y. 


,~Y,~ 

<r ijli ^ 


Finally we can write 


(30) F(^) = 


no 




9t 


Case in. This time, the second partial derivatives 


Vffi 


iQi 


are 



truncated samples 


565 


where 


1 +t'Qi - + 


no 
N ~ 


no 


(32) 


/h(t', n = - 

“ {f «> [(/?^) <e. - W - £ 

W£',i")-0S=(r-j^Q.) 


(ft - ft)^ 




+ [ft — Qi ~ ^'] 


R 


~ 2 - r(ft - Q2 - n - ft - 


Accordingly we obtain 


(33) y(«-- 


-ki 


hi hz “■ ho 






^/hihi' 


Note that variances of the estimates for each case considered, can be computed 
for given values of and o- from standard normal tables of areas and ordinates 


9- Singly truncated samples. If only the left ‘tail’ is missing from the samples 
thus far considered, then , nj = 0, sp" — 0, 1'n' = 0, and hence Z^, Yi, 

and Qj each equal zero. Upon substituting these values in (12), (16), and (20) 
respectively, estimating equations applicable to singly truncated samples are 
obtained as special cases of the estimating equations for doubly truncated 
samples. Of course Cases II and III become identical when samples are singly 
truncated When Us = ft = Oj then Yi = Qi, cf. (15) and (16) 

Case i. With Zj = 0, the estimating equations (12) become 

<r [Zi — ^'] = vi, 

<rHl - l'(^i - r)] = >'2 . 


Eliminating a between these two equations we have 


(35) 


2 

>'1 


vzi - r 



which is recognized as the Pearson-Lee-Fisher equation in a form which was 
previously given by the author [5]. 

Case ii, With Yj = 0, the estimating equation,s (16) become 


(36) 


[Yi - 5'] = Pi 
cr^l - |'(7i - n] = . 

Eliminating cr between the above equations, we obtain 


(37) 



Fi - r \Yi - f' 



I 



A, C. COHEN, JR. 


5GG 


which is in a form completely analogous to (35). Furthermore, this equation 
can be solved for f' in the same manner as (.3.5), cf. [5j. Since o- can be eliminated 
between estimating ecpuilions in singly truncated cases, but not in doubly 



truncated cases, the numerical computations are much simpler and less laborious 
for singly truncated samples. 

If the right rather than the left tail is missing from singly truncated samples, 



truncated samples 


667 


jphcable estimating equations can be obtained from (12) and (16) by translating 
le origin to the terminus on the right and setting and 7i equal to zero 
,ther than Zi and ¥%. 



The variance formulas (25) and (28) likewise assume more simple forms with 
singly truncated samples. Substitute = 0 in (25) and the variance formulas 
applicable with singly truncated samples when the number of unmeasure 




('. «r)nK\, JIS. 


observiviiims i" niilnit'.Mt, Itputmt' klciilica! lu fuuu with thn^p previously given 
by tlie wniiT in M! \\ lien (hr* miinbcr nl mimcaMiicfl obhCiViitiouH m a huiglv 
tnmcated sainph* i*- Ivinmii, ibv aiipliciililc lariaiicc forimilns (2H), on .selling 

I's = 0, li>‘t'(itiii‘ 

(3Ki r/.r, - '"‘ll-fr) and 17^') « 

n Ti 

irliere 11' and ir may be mnaidi-d as \\'*iglitiiig fmictidii.s dcfiiietl by 
mi irteM - _-_ 


„ 2 - - O_ 

wu _ ,,, 5 .^ _ _ (J-, _ tf 

Bimilarly, the eorrelafion between .sampling cmtois of a and f' in thi.s east; becomes 

__ n~s' 

__ ^ ndWaVd- r)l’ 

A comparison of the varianee.s (ilH), with tho.se aiiplicablo when the number of 
unmeasured oliKorvations is unknown, Berve.s to indicate the extent to which 
information contained in a singly (runcaled sample is in(:reu.scd by adding 
knowledge of the number of unmoaBured observation.s. To facilitate, such com¬ 
parisons, W, w, and corresponding functions If''' and w' applicable when the 
number of unmeasured observations is unknown, are displayed graphically in 
Figures 1 and 2. In computing the plotted values of W and w, the ratio n/.V 
in (39) and (40) was replaced by To. This ratio is, of course, an estimate of To, 
and for n and N sufficiently large, the substitution i.s amply justified. Equations 
for W and w' can be found in [5]. For further comparisons, a graph of w" ap¬ 
plicable in determining thei'ariance F({*), where f* is estimated from n/N alone 
is also included in Figure 2. This latter function is defined as 

(42) w"(^*) = . 

<r 

It follows from the well known formula for the variance of {*; 

1 ihi\ - h)\ _ 1 //5(1 - /o)\ 


An examination of Figures 1 and 2 disclo.ses that except when the omitted 
portion of the distribution is small ({' < —3), the variances of the estimates of 
O' and based on singly truncated normal samples are substantially less when 
the number of unmeasured observations is known than udien this information 
is lacking. 



TRUNCATED SAMPLES 


569 


REFERENCES 

[1] K Pr.AHSON AKi) A Lee, “On the generalized probable error in multiple normal cor- 

lelation”, Biometnka, Vol. 6 (1908), pp 59-68 

[2] A Lee, “Table of Gaussian ‘tail’ functions when the ‘tail’ is larger than the body”, 

Biometnka, Yol 10 (1915), pp 208-215 

[3] R. A. FlftHBR, “Propertios and applications of functions”, Mathematical Tables, 

Vol. 1, pp xxvi-xxxv, British Association for the Advancement of Science, 1931 

[4] A. IIaIiD, “Alnximum likelihood estimation of the parameters of a normal distribution 

which is truncated at a known point", Skandinainsk AktuaTietidskrifl, Vol. 32 
(1049), pp 119-134. 

[ 5 ] A C Cohen, Jr., “On estimating the mean and standard deviation of truncated normal 

distributions”,/our Am SiaL Assn , Vol. 44 (1949), pp 518-525. 

[6] W. L Stevens, “The truncated normal distribution”, appendix to “The Calculation 

of the Time-Mortality Curve" by C I. Bliss, Annals of Applied Biolog'S, Vol 24 
(1937), pp. 815-852. 

[7] W G. Cochran, “Use of IBM equipment in an investigation of the truncated normal 

problem”, Proc Research Forum, International Business Machines Corp , 1946, 
pp, 40-43. 

[8] E T. Whittaker and G Robinbon, The Calculus of Observations, Second Ed , Blackie 

and Son, Ltd., London and Glasgow, 1929, pp, 88-91. 

[9] P. C Maiialanobis, “Tables of random samples from a normal population”, Sanhhya, 

Vol 1 (1934), pp. 289-328. 



THE ASYMPTOTIC PROPERTIES OF ESTIMATES OF THE 
PARAMETERS OF A SINGLE EQUATION IN A COMPLETE 
SYSTEM OF STOCHASTIC EQUATIONS’-^ 

By T> W. An'dbbbon* and Herman Rubin’ 

Volumbin, Unwersihj and Instilulr for Advanced Study 

1. Summary. In a previous paper [2] the authors have given a method for 
estimating the coeflieicmts of a single equation in a complete system of linear 
stochastic equations. In the present paper the consistency of the estimates and 
the asymptotic distributions of the estimates and tlio test criteria are studied 
under conditions more general than those used in the derivation of these estimates 
and criteria. The point eJitimates, which can be obtained as maximum likelihood 
estimates under certain assumptions including that of normality of disturbances, 
are consistent even if the disUirliancos are not normally distributed and (a) some 
predetermined variables arc neglected (Theorem 1) or (b) the single equation is 
in a non-linear system with cerlain properties (Theorem 2). 

Under certmn general conditions (normality of the disturbances not being 
required) the estimates arc asymptotically normally distributed (Theorems 3 
and 4). The asymptotic covariance matrix is given for several cases. The criteria 
derived in [2] for testing the hypothesis of over-identification have, asymp¬ 
totically, x*-distributiona (I'heorom 5). The e.xact confidence regions developed 
in [2] for the case that all predetermined variables arc exogenous (that is, that 
the difference equations are of zero order) are shown to he consistent and to hold 
asymptotically even when this assumption is not true (Theorem 6), 

2. Introduction. The complete sy.steni of linear stochastic equations con¬ 
sidered by the authors in [2] was written 

(2-1) + r,^', = t'i , 

where yi is a row vector of G jointly dependent variables at "time” t, zi is a row 
vector of K variables predetermined at i, and e* is a row vector of "disturbances,” 
and Byy and Pyj are matrices. If Byy is non-singular the distribution of e; induces 
the distribution of yt given Zt . 

One component equation of (2.1) was given special treatment. Let ^ be 

1 This paper will be included in Cowles Comniission Papers, Now Sorioa, No. 30. 

* The results of this paper were presented to meotings of tho Institute of Mathematical 
Statistics at Washington, D. C., April 12,1040 (Washington Cliaptor) and at Ithaca, New 
York, August 23,1946. Most of tho research was done at the Cowlos Commission for Ee- 
search in Economics; the authors are indebted to tho members of the Cowles Commission 
staff for many helpful discussions 

* Fellow of the John Simon Guggonheira Memorial Foundation; Rosearch Consultant 
of the Cowles Commission for Research in Economics 

■'National Research Fellow, Research Consultant of the Cowles Commission for Re¬ 
search m Economies 


570 



ASYMPTOTIC PROPERTIiaS OP CHBTAIN ESTIMATES 


671 


composed of the^ coefficients of the coordinates of yt which are not assumed 
zero in the specified equation, and let Xt be composed of the corresponding 
components oiyi similarly let y be composed of the coefficients of the coordinates 
of zi which are not assumed zero, and Ut the corresponding components of 2 t ; 
and let be the component of et associated with the specified equation. Then 
the single equation is 

(2.2) /3x» + yu't = ft. 

Suppose we have a set of observations Xt,Zt,t = I, ■ , T. For sets of any 
two vectors at and , let the second-order moment matrix be 

(2.3) I'Ea'tbt. 

1 <»i 

Let St be some linear transform of vi, the set of coordinates of Zi not contained in 
Mt, chosen so M,u = 0. Defining 

(2.4) W„ = M„ - , 

and assuming et normally distributed with mean 0, covariance matrix 2, and 
independently of e,<(f t'), we find the maximum likelihood estimate of |8, 

to be proportional to a vector defined by 

(2.6) {M»MTiM„ - pW„)b' = 0, 
taking v as the smallest root of 

(2.6) I - vW„ I « 0. 

The vector is normalized by 

(2.7) = 1, 

where may be a function of the estimates of other parameters. The estimate 
of T is [2; Theorem 1]. These estimates were derived under the 

following explicit Assumptions A, B, C, and D: 

Assumption A, The selected structural equation (2.2) is one equation of a complete 
linear system of stochastic equations. It is identified hy the fact that if H is the 
number of coordinates in xt , there are at least H — 1 coordinates in Vi, the vector of 
‘predetermined variables in the system, but missing in (2.2). 

Assumption B, At time t all of the coordinates of St = (ut, vi) are given. 
Assumption 0. The coordinates of are given functions of exogenous variables 
and of coordinates of yt-i, yt-i i • • • - If coordinates of ya, y-i , • ■ ■ are involved in 
Z(, they unll he considered as given numbers The moment matrix M„ is non-singular 
with probability one. 

Assumption D. The disturbance vectors are distributed serially independently 
and normally with mean zero and covariance matrix 2m . 

Under these assumptions it is found that (1 v) is the likelihood ratio 



572 


T. W, \NDERBON AND HERMAN RUBIN 


criterion for testing the hypothesis that the numlier of components of Zt assumed 
to have zero coefReients is so great. 

If there arc no lagged endogenous variables in , we can find confidence 
regions for and for /3 and y simultaneously as well as an approximate test for 
the above hypothesis. The assumption.s used for these results are A, B, and 
Assumption E. AU thr coordinaU’s of zt ~ (U|, v,) are exogenous. The moment 
matrix Mu is non-singtdar. The dislurbances of the selected equation are distributed 
independently and normally with mean zero and variance 
Assumptions A and B are used in this paper and a number in addition, 
which will be lettered similarly. It. is to be emphasized that the various assump- 
tions are used alternatively, never all at once; in fact many assumptions are 
mutually exclusive. 

3. Consistency of the estimates. Tlie estimates $ and are consistent not 
only in the case for which they are maximum likelihood estimates, but also in 
ca^es ip which the disturbances are not normaUy or even identically distributed. 
Moreover, for consistency of the estimates it is not necessary that the investigator 
know all of the components of Vt or use them. Another direction in which the 
assumptions may be relaxed is to permit the other equations in the system to be 
non-linear. 

"3.1. The linear case, This case is characterized by Assumption A. We need 
also to assume: 

Assumption F. M„ converges to a fixed non-singular limit R in probability. 

Let Ui consist of the port of that enters the selected structural equation (22). 
The remainder of the components of Zi are divided into two groups os to whether 
they are known or not. Let c, be a linear transform of the known components 
not entering the specified equation such that 

(3.1) plimilfu, == 0, 

and let r, be a linear transform of the components of Zi not known such that 

(3.2) plim Mur = 0, 

t“*oa 

(3.3) plimAfn. = 0. 

l-*eo 

The relevant part of the “reduced form,” obtained from (2.1) by multiplication 
by is 

(3.4) X, = d- 1 ImC( "h • 

The matrix (n„II*r) is II„ (defined in [2]) multiplied on the right by a non¬ 
singular matrix; hence, /SII*, = 0, and similarly = y- We shall find it 
convenient to assume 
Assumption G. n*, has rank H — 1. 

This means that for T sufficiently large the probability is arbitrarily near 1 
that (2.2) is identified. 



ASYMPTOTIC PROPERTIES OE CERTAIN ESTIMATES 


573 


However, these conditions still do not insure consistency. We need the asymp¬ 
totic; analogue of lack of correlation; 

Assumption H. 

1 ’’ 

plim = d- 

7-^00 1 1 


We do not need to require that the covariance matrices of Si are the same or 
even that they exist. We shall make an assumption about 


(3,5) 







Mu^Y 

mJ 



A.s»umi>tion I, The ratio oj the largest to the smdlest characteristic roots of Wx^ 
is bounded in ’prohahilitij. 

This means that for a suitable constant K 


(3.6) 


lim P 




= 0 , 


where P(E) donolos the probability of event E and s(A) and 1{A) are the smallest 
and largest roots of the matrix A, respectively 
Assiimption.s F and H imply that Pm —♦ ftiu and Pm —^ Hie in probability, 
where Pm = dfwAf7 u and P,c is the part of 


(3.7) 


(MxuMm) 



m,.Y 

MxJ 


corresponding to the vector® Ci, The first assertion follows because 
(HmAf^u + HmAiTm + n„Mru + M^)MZUnd 0. Mm ^ 0, andM,. ^ 0 
in probability by (3,1), (3.3) and Assumption S, the second assertion follows 
similarly. Since matrix multiplication is continuous, and the characteristic roots 
of a matrix are continuous functions of the matrix, 

(3.8) plim s[PioAr«Pm] = 0, 


where J¥.. = (M„ - This follows fiom the wdl-known theorem 

(a proof of which is given in [4]) that if a random vectorconverges sto¬ 
chastically to X, then /(Xr) converges stochastically to /(X) if Ky) is continuous 

shall find the following lemmas convenient. The proofs are simple and 

5 be 'posilive definite, A positive semi-definite. Then the smalUst 
root vof\ A - xB\ = Ots less than or equal to s{A)/s{B). 


• See Section 4 of 12]. 

»Because of the assertion above and Assumptions 
of the matrix approaches zero in probability. 


F and G only one charactenatio root 



574 


T. W. j\NDErtHO.V ANI> HEIUKAN KUHIN 


Lemhu 2. Bach eltmenl of a positive drfmilc inalru is less in absolute value 
than the largest characteristic root. 

Let r be the smallcKt root of 

(3.9) 1 pm.bL ~ plK, I = 0. 

Then plim vWlt * 0. This Hlatmont follows from (3.8) and Lemmas 1 and 2. 

T-*M 

Since 0 is a simple characteristic root of !!„ plim , it follows from (3.9) 

T*-*^ 

and the consistency of /’,«and P», that ^ approache.s p apart from normalization. 
The following theorem results directly; 

'Theohem 1. Under Assumpiwna A, F, G, H, and 1, and if ptimP^ixP' = 1, 

r-*ec 

(3.10) plim ^ 

r“*so 

(3.11) plim 7 , 

r-*flo 

where ^ andi arc calculaled as if Ti 0 and as if the remainder of A, It, C, and D 
were satisfied.'' 

3.2. The non-linear case. In this section we apply the estimates obtained in [2] 
to an equation of a complete system in which tho remaining equations may be 
non-linear. We replace. Assumption A by the following assumption: 

Assumption J. The. selected structural equation (2.2) is one equation of a complete 
system of stochastic equations: 

(3.11) /'’.(?/<, ^<) “ (f = 1, •■• ,(?). 

Let us solve the complete system (3.11) for the components of ijt. We obtain 

(3.12) ytj = hj{zt, e,). 

Let Ui be the subvector of Zi occurring in the selected structural equation. 
Let Cl be a vector function of Zi such that plim Mch - 0. We may write (3.12) 

r-*6o 

for those y's occurring in the selected structural equation as 

(3.13) Xi = TLxuUi + IIxcCj (p'{zt , f()i 

where the components of p{zt, tt) are the residuals from the formal limiting 
regression of Xi on Ui and Ci. The proof of Theorem 1 can be used to prove the 
following: 

Theorem 2. If Assumptions F, G, H, I, and J are satisfied with zi replaced by 
(ut, Cl) and St replaced by (p{zt , «()> U ■= 0, and if plim = 1, then 

(3.14) plim $ = P, 

(3.15) plim ■f = y. 


I Thifl follows from the above statements because C and -y are (vector-valued) rational 
functions of M„ , Pi, , W*! and which approach limits in probability. 



ASYMPTOTIC PEOPEHTIES OP CHETAIN ESTIMATES 


575 


4. The asymptotic distribution of the estimates. 

4.1. The asymptotic distribution of and P^u ■ To obtain the asymptotic 
distribution of tiie estimates we need stronger assumptions. Throughout Sections 
4.1 and 4 2 we use Assumptions A, B, F, H, I, and the following: 

Assumption K. The exogenous vaiiables are bounded, the vector of disturbances 
of the complete system has mean zero, and is serially independent’, for some X > 0 
and some M, (S(| fii, I*’’"*') < iif; the coordinates of Zt may be linear combinations of 
lagged endogenous variables. If the endogenous part of a coordinate is 


eo Q 

S ! 7 t. 2 /(- 
r“"l ^*1 


then 


and 


oQ (3 
T-I t=l 


fi'r. 


< CO 


22 ^2 Orxl/t—T^i 
paat t»“l 

is hounded. 

Assumption L, The matrix is known and constant. 

Assumi’TIon M. For each i,j, k, I, l<i,j^H,l^k, l< K, 

1 ^ 

lim ■=, &(St,5 1 ]zII.zli) — Sijki 

I"-.03 i <-1 

exists. 

Let the components of Myy, My., M» be arranged as a vector m(T) with 
mean value y{T). Ithas been shown [3] that VT{m{T) - yiT)) is asymptotically 
distributed according to J\r(0, 2 ), the normal distribution with mean 0 and 
covariance matrix 2 composed of elements 

a„ = hm F>{T[mi{T) - y,{T)] [m,{T) - m,(T)]). 

r-*«s 

In conjunction with this result we make repeated use of a special case of Theorem 

^ lippese Vf{z,r - O' = 1. • • • . «) asymptotic distribution 

JV(0, 'T) with iitbeing functions of T such thatl^^ = 0 • hrizi • . 2nJ 

fc r / \ 

be random BorelmieasuraUe funciims of n real variables such that — = 

exists with probability one for T snfficimtly large and z m a fixed neighbor^ 
hood of f. and suppose that there exist numbers a*, such that for any e > , 

and X > 0. P( 9 up , 1 *) -VProaxhes zero. Then if 

= f. If,™ • ' the random variables 

'Vfiyj-'^ZT) have the joint "asymptotic distribution iV(0, A-^A'), where A = 

(“.;)■ 



57(5 


T. V'. AxnJ.Hsti.v wn iii.iiuvx kchix 


To obtain tlin asyniptotio, disliibntions wo havo only to voiify tliat tli(‘ a.'isvuiip- 
tionn of this Htatement am hiitiftlicd, and cominitt* A, .‘'irico (he aM'inptotic 
distribution is charafteriml ecJinpUttoly Ity .I'l'.l'. We (■ludl denote the cleniont in 
the ^-th row and /-th column of ri’l'ri' Ity o-f/t ,/i). We riiall find it convenient 
to use the notation df = Adx\ that is, the dilTerenliuI dj is defined in terms of the 
limit matrix A, 


Let 


(4.1) 

A = M lu, 

(4.2) 

H - 

(4.3). 

C = plim Muu 1 

7°“"* 80 

(4.4) 

E - plim M„ , 

(4.5) 

L - P,„, 

(4.6) 

P = P„ - d/«d/7.', 

(4.7) 

A = n„, 

(4.8) 

11 - 11«. 

The matrix! L i,s tho 
random function BM 

random fumdion .IjWm, + 11,, A/,.,0/7!, -i* -i of A, P is the 
r.' A- n of H. 'Fhen 

(4,9) 

dL = (dA)Cr' , 

(4.10) 

However 

dP - (rf//)/r‘. 

(4,11) 

aifltk, a,i) « «,,H , 

(4.12) 

cr(a,k , b,i) = 0i,ki ) 

(4.13) 

^ib,k , b,i) = y,,ki , 


where atjki, ^hki, yi,ki are the appropriate quantities kabed, respective!}". From 
these we may compute cr(h,, hi), aQ.,,, pki), and cr(p,;, pki), the elements of the 
asymptotic covariance raati'ix of the elements of L and P (which are asymp¬ 
totically normally distributed by tho above). These element.s ean be e,stimated 
consistently from the sample (the proof follows from Theorem 1). 

4.2, The asymptotic distribution of ^ and i for constant normalimtion. In this 
section wo shall show that ^ and i are asymptotically normally distributed 
(Theorem 3). In view of the above theorem on asymptotic distributions the 
intricate part of the proof is in obtaining the covariance matrix. First we shall 
demonstrate that the elements of uTV are o(1/\/t) in jn'obability. Since Assump¬ 
tion I holds, it is sufficient to show that s(P*,il'/,i.Pr.) is (?(l i-s/r) in probability. 
This means d | Px,M„Pr, | = 0, since each of the characteristic roots of 
Px.M aa Pxa except the smallest approaches a non-zero limit in probability. 



ASYMPTOTIC PROPERTIES OP CERTAIN ESTIMATES 


577 


For any matrix A, A,, denotes the matrix obtained by deleting the i-th row 
and j-th column from A, and is the matrix obtained by deleting the t-th 
and fc-tli rows and the j'-th and J-th columns. Let 

where e = 0 if (t - fc) (j ~ Z) > 0 , 1 otherwise when i fc, j 5 ^ 1. = 0 

if 1 = k or J = 2. In the rest of the paper we use the summation convention of 
tensor calculus for lower case indices; namely, that whenever a lower case letter 
appears as a superscript and a subscript in an expression, the corresponding 
terms are to be summed on that index. 

In general 


(4.14) d U I = A’^da.,-. 

We may consider Px.M„PL as a random function of P^,. Then 

(4.15) d(M-th element of P^M,,PL) = dpi + vletidp). 
However 


(4.1G) (n„EnL)’^ = p’fi' = 

where p' is a factor of proportionality. Since pilx. = 0, we have d j P«Ma.Px. | = 0. 
Then it can bo shown that d(flxiIW„ft*, - PiM,,P'z,) = 0, where fl„ = 
J _ P, 


Let 6 = TAitEll'x, and F — Px,M„P',x ■ We know that where 

Pi = l/p'^ (and the capital letter J indicates that there is not to be a sum on 
that index), and @ . Hence 

(4.17) d^‘ = p^dS’-' + Q'^dpj. 


However = 1; therefore pj — ( 6 ’'' 6 ^V.*) * From this it follows that 

(4.18) dpj = -(p/)*0’v.tde''-' 

From (4.14) wo see dd*"' = 0*‘^'"^d^op .Therefore 

(4.19) d^‘= P/[0'-''"^ - 

Let us define == iSV.y • Let us multiply (4.19) by and i/,. We obtain 

= p^s/e’-^dLn - pje^^dKp = , 

(4.21) = 0- 
Let us simplify (4.20), We see that 

(4.22) ^“d^T- = ^‘‘■Aekidpl. 



T. \V. ANDKIiKOX AXn HKKMAX inXHN 


HeiK‘e 

(4.23) „ , 

“ = n-rr i 

aay, Let (f0\ == (ji\ und lot Qi ^ (?P). Then from ('L20) anti (4.23) wo obtain 

(4.24) -= Ri, 
and (4.21) b 

(4.26) \K>i “ 0. 

It may be shown (see [1], for example) that the solution is 

(4.26) Qi = (/ - ^V).it(Ou)--‘(7i:,)u(n«)-‘(Z - , 

where lc(l ^ k < 11) is arbitrary except that fl* 0, and A.* denotes A with 
the ^:-th column deleted, etc. If the normalization i.s 0' - l,k = i is a convenient 
choice. 

Since = —fiL, 

(4.27) (ir “ - H'dCt. 

Hence 

(4.28) <r(/3', r) “ -c0\ /3-)X7 - m\ 

(4.29) tr(^'", f) = ^OXTX; + <r(4\ i7)/3''x; + l^)0X -j- a(C, 

We, therefore, see that we must compute it0\ C)/?' and ir(i7, We find, 

from (4.20), (4.21), and (4.22) that 

(4.30) 6yfi‘(r0\ Hi) = —/3'/3VJc'"'’/9,-,>jt = Tiy , 

say. Let {<r0\ Ti)P^) = Qa, andlct J 22 = (rty). Then, from (4.30) and (4.21) we 
obtain 


(4.31) QQ2.^R2, 

(4.32) = 0. 

The solution is 

(4.33) 0, « (7 - P'<P).kiQkk)~^{Rilk^ . 

We find, readily, that 

(4.34) ^'/3V(c. i:) = « qr, 

say, where (c'"’’) = C~\ Let = (§'"''). This concludes the proof of Theorem 3. 

Theorem 3. If Ammplims A, B, F, H, I, K, L, and M are satsfiedt •\/T0 —i9) 
and s/ri^ — y) are asymptotically jointly normally distributed with means zero 
and covariance matrix 

(4.35) a0', ^) = Qi, 



ASYMPTOTIC PHOPEBTIES OP CERTAIN ESTIMATES 


579 


(4.3G) Qi, 

(4.37) 11(7 ) 'f) = Tlg^QJlxu “h ^xuQ2 “h Qsflm "I" Qs, 

where Qi is given iy (4.26), Q 2 by (4.33), and Qs by (4.34). 

If there is a kind of asymptotic independence of f, and Zt , then the above 
expressions may be simplified. Corollaiy 1 results from Theorem 3 and the 
following assumption: 

1 ’’ 

Assumption N. lim — ^ S(f jzlsi) = whereR is defined in Assumption F. 

J- t—i 

CoEOLLAHY 1. If Assumptions A, B, F, H, I, K, L, M, and N are satisfied, 
-\/T0 — fi) and \/T (7 7 ) are asymptotically jointly normally distributed with 

means zero and covariance matrix 

(4.38) = <7^(7 - fi'fi) , 

(4.39) 7 ) = - /3V).fc(0tt)~‘(nx. + l7'7)r. , 

(4.40) = (’■^[(flxu + 7V).v(0i)b) ^(fliu + j/'7)k + C^’]. 

4 . 3 . Asymptotic distribution of the estimates of the parameters p and 7 with 
normalization a function of . 

If we relax Assumption L that is constant, we obtain a more general 
result. Since the proof, however, is more involved, we shall not give it here; 
the reader is referred to [1], In the derivation of the estimates On was defined as 
S(5^5<). In the asymptotic theory we do not assume that this is the same for 
each t. We use the following assumption; 

Assumption 0. lim — 2 &(.SuSt,SikZid = n„ki exists', 

r-^oo i e—1 


1 

lim yf, Yj <S(i5i,i5o) = w.j exists-, 

1 r 

lim yy, 53 &{SitSijSikSti) = o>t}ki + “owii exists, 
r-t«j I (—1 

Let S.jfci be the quantities n„ki corresponding to the it’s, e„*i, the quantities 
corresponding to the c’s. Define 


(4.41) 


(4.42) 

1 *J, 

ny ^ P TTyX etlkl , 

(4.43) 

q'i = (7 — /JV) t(04ft) ^iu}k 

(4.44) 

fc I ~ 

q& = X X > 

(4.45) 

— X P • 


With the aid of the matrices Qi, Qs, and Q 3 , the vectors 


and qt, and the 



580 


T. W. AXBKIIHON AXD ItKRMAN RXJBIN 


scalar qy , wc may axprc.^H tlia a.^vinplotic covariiuuT matrix of the estimates, 
Wo obtain 

Thkohkm ■!, 7/ AHmnnptiom A, B, F, II, I, K, AI, ami 0 arc satisfied^ and 
4>»* is afnnrlionoj U„ , \^T(d ~ d) and VTCf ~ y) are asymptotically jointly 
normally dislrihulrd with mraiis zero and covariance matrix 

(4.4G) cr0\ ( 5 ) Qi +■ q^S + ff'q, + qili'0, 

(4.47) -f) « -Qifljtu + qiy — 0'qiflzu + qtd'y ~ Qi — 0'q&, 

^ — n*«947 — 7'?4ftiii + 96 tV 

(4.48) 

+ + Qallju -- y'qs — q^y + Qj, 

where Q\, Qt, Qi, qi, q-^, and rp are given by (4.2fi), (4.83), (4.34), (4.43), (4,44), 
and (4.45) respectively, 

OoRor,T,.uiY 2. If Assumptions A, B, I), F, 11 and K ore satisfied, and 
4’m = — 0) and \/q‘(i — y) are asymptotically jointly normally 

distributed with means zero and covariance, matrix 

(4.49) a0', ^) = (7 - 7iV)a(Ou)"‘(/ “ + m, 

(4.50) <r(4', ■?) =. -(7 - 0'il^).siBskr\Thu + ry)k. + Wy, 

(4.51) <r(f, - (flL + TV).*(0**)"'(fi« -h + (T* + h'T- 

6. Asymptotic distribution of the likelihood ratio criterion and the small 
sample criterion for testing a certain hypothesis. The likelihood ratio criterion 
for testing the hypothesis that the number of coordinates of zi with zero co¬ 
efficients in the selected structural equation is as great as it is assumed to be is 
(1 -f v)"*'’ [2, Theorem 2], where v is the smallest root of 

(5.1) I P,.M..P» - vW„ I - 0. 

Then 

(5.2) n-T 

From Theorem 5 of [4] it follows that the asymptotic distribution of Tv is the 
same as that of the quadratic form x t x', whore x has the limiting distribution 
of V T^Px,, use being made of plim $Wix$' » o-“. Wo have 

(5.3) dx' => 0^dpj + d$Vj. 

Let T = (7 - 0'ip).kiQkk)~V - ip'0)k. . Then 

(5.4) d/3' = ~v^0\Te„„dp: . 

Substituting in (5.3), we obtain 

dz' = 0^dpj — i»'*^'7rtemndp"fl-|, 



ASYMPTOTIC properties OP CERTAIN ESTIMATES 


581 


Then 

(5 f>) •t") = ~ 

say, and (t'") = E 

Let F ho cho^on so E = FF' and F'SF = iP is diagonal. Since EaE^E = BaE, 
the diagonal oloments ofare 1 and 0. The number of elements that are 1 is 
the rank of EtuE, namely, Z) — Zf + 1, wheie D is the number of coordinates 
of Vt (the number of coordinates whose coefficients in the selected equation are 

assumed to bo zero). Let z =-xF. Then the asymptotic distribution of Tv 

a 

is the distribution of zz' where z is normally distributed with mean zero and 
covariancematri.Y It is the x’^-distribution with Z) - ZZ + 1 degrees of freedom 
Wo observe that T log (1 + v) and TDk are as}rmptotically equal to Tv, where \ 
i.s the criterion based on .small sample theory [2, Theorem 4]. Finally, we note 
that V is independent of the normalization of 
Theorem 5. Z/ Assumptions A, B, F, H, I, K, M, and N are satisfied, —2 times 
the logarithm of the likelihood ratio criterion, — T/2 log (1 + v), the asymptotically 
equivalent Tv and TD times the small sample criterion, X, for testing the hypothesis 
that the number of coordinates with zero coefficients is D are asymptotically distributed 
as X D — ZZ + 1 degrees of freedom. 

This theorem indicates how conservative the small sample test is asymp¬ 
totically, for that test asymptotically is equivalent to using Tv as having an 
a.symptotic x^-distribution with D degrees of freedom. 

6. Asymptotic behavior of confidence regions based on smaii sample theory. 
In [2] wo deduced confidence regions for /S and for /S and 7 when Assumption E 
holds. If the normalization of /3 is 


(G.l) = 1, 

where 4 ’m is a given matii.x, then a confidence region (a) for ^ of confidence e 
consists of all satisfying (G.l) and 


( 6 . 2 ) 





F D,T~kU), 


where Fo,T~K(,e) is chosen so the probability of (6.2) for ^ is e and K is 
the number of coordinates of zt and D is the number of coordinates of vi A 
region (b) for 0 and 7 simultaneously consists of 13* and y* satisfying (6.1) and 


(G.3) 


+ y* Mux0*' + y*M„,,y*' 

0*W„0*' 

< FK.T-Kie). 


We shall now show that even if Assumption E does not hold the regions have 
asymptotically confidence coefficients e and they are consistent under general 
conditions. 



T, W. AKHKltKOK \Xn HKIlMAN litiniN 




Let r “ liMiJf'ui i 7, >' - dMtJIil. We nlwrve from Heotinn 4 that if 
AKSumptioiis A, H, F, H, K, L, M and X are Mili.stieil, the veet()rH\/?c and 
have aHyniptotie indejK'tulenf distriliuliona A'fO, and WfO, alf\ 
respectively Then nnd 7Vd/„('.V* willluiveasyraptotie, independent 

x’-diatritiiitions with /"’(“■ K- I)) and 1) tleKrees of freedom, respectively. 
Also approaelie.H stochaafieally. By 'Fheorems d and (i of [4], the left- 
hand sides of (0.2) and tO.H) have asymptotic /''-distrilnitions with D and T~K 
degrees of freedom and K and T - K degrees of freedom, respectively. 

Wc shall prove that (a) is conHistcnl for p; the proof is similar for (b) as a 
regioiiford andy. If we replace d hy b in the definition of c,c j/„c' ~ 

For h 7 ^ ^ we must show that the probability that h will fall in the confidence 
region ford approaches zero. The above form approaches hIT«A1I*d/ in proba¬ 
bility. If li 7^ d and satisfies (0.1) then MI« 0 and cM,/' has a non-zero limit 
in probability since A" is positive flcfinitc. Thus h is not. in the limiting confidence 
region, 

Theouexi 0. If Ammplims A, B, F, H, I, K, iM, ami N are, mlkfid^ the 
cmfuicnce rc^mu of Thmim 3 of [2] [inclmlmj (a) and (h) nhovr) arc conmlenl, 
and Ik regions (a) ami (b) hare asyniplolicalln Ow confidence heh f. 


IIKFKRENCES 

[1] T. W, Anderron' AMD Hermak lUmm, "Esliiimtioti of the paranietcrs of a single sto- 
olmstic (liffcrcnct* pqualion in a complete flyslwii,'’ Cowles Commission for llpsearch 
in Economips, 1947, dittoed, 

12] T. \V. Anderson and Herman IltmiN, "Esliraiition of tlip parametprs of a singlp pqufl- 
tion in r complete system of stochaslie pqufttions,** Annals of Malh. Slat , Vo). 20 
(1049), pp. 46-63. 

[31 H, ItuDiN, "Consistency nnd asymptotic normality in stahle linpar slnehastic dilference 
systems," to bo published. 

[4] H. Rubin, "Topological properties of measures on topological spiipps," Duke Hath 
Journ., to be published. 



SOME NONPARAMETRIC TESTS OF WHETHER THE LARGEST 
OBSERVATIONS OP A SET ARE TOO LARGE 
OR TOO SMALL 

By John E. Walsh 
The Rand Corporation 

1. Suminary. Let us consider a large number n of observations which are statis¬ 
tically independent and drawn from continuous symmetrical populations. This 
paper presents some nonparametric tests of whether the r largest observations 
of the set arc too large to be consistent with the hypothesis that these populations 
hav'C £i common median value. Tests of whether the r largest observations are 
too small to bo consistent vdth this hypothesis are also considered. Here r is a 
given integer which is independent of n. 

Subject to .some weak restrictions, it is shown that the significance level of a 
test of the type presented tends to a value a as n increases. For no admissible 
value of n, however, does the significance level of this test exceed 2a. If whether 
the largest observations are too large is considered, tests with values of a suitable 
for significance levels can be obtained for r > 4. Values of a suitable for sig¬ 
nificance levels can be obtained for any value of r if whether the largest observa¬ 
tions are too small is investigated (n large). 

Properties of the power functions of these tests are considered for the special 
case in which the r largest observations are from populations with common 
median 0, the remaining observations are from populations with common 
median <^, and each population has the property that the distribution of the 
quantity 

(sample value) — (population median) 

is independent of the value of the population median. For tests of 0 > 0, the 
power function tends to zero as 0 — ^ « and to unity as 0 — . For 

teats of 0 > 0, the poiver function tends to unity as 0 — 0 —>• — = and to zero 
as 0 — </i —. 

Analogous tests of whether the smallest observations of a set are too small or 
too large can lie obtained from the tests of the largest observations by symmetry 
considerations. 

If there is strong reason to believe that the set of observations is a random 
sample from a continuous population, the tests presented in this paper can be 
used to decide whether the population is symmetrical. Tests of this nature are 
sensitive to symmetry in the tails of the population but not to symmetry in the 
central part. 

2. Introduction and statement of tests. The tests derived in this paper are 

applicable to situations of the following two types: 

(a). It is known that the observations are independent and from continuous 

683 



5H4 


JOHN K. ^YAI>iH 


.symiiu: 1 i» .(I itrjiiil.i'mn- oo , ciicli litis a coiitimKnis 

mjcli tiuii ~ 1 - /' ((i .<), hIutc 4> in ll'e poinilation nualian) 

If ili'sirwl fii tf‘st ’A}i«*fln*r flu* larRfKf iiliMa'vution.s uro toci larfrp 
(nr l(«t''inulli tn lu* i"in'-Wt*n( willi (lif u}*Mirniiti(m fluU llu' laipulalions 
havt* !i ftiminiin lut’tlian valur (if flu- .UK ,' |tiiiril of u coHtinuou.s .syni- 
itietrical jitiiailalitui i,'' inH Hiiiiiiic, the iticiliaii nf iliia iioinilafion ia dc- 
fnud (II la* llio inidjmml nf (lu* iiHcival nf ."((KJ poiiKs). 

(li). It i.s kruiuri (lia( (lu* (iltM>rva(imi.s an* iiuli*iu*iirli'ii( and from continuou.s 
|)()f)nliiti()rif= widi a (’omrnfin mnliau value 'e.jj;.. llu; ob-vervadona may 
be a sample from a contiiiuoii.s luipulationi. It i.s dt'sin’d fo (e.st ^\be(her 
Ihese popwl'ations are symmefvind {wifh emjdia.sis on the tails of the 
populaiitm). 

IVith re.spec't lo Ui), pi'rbai)K (lie most common piactieal application is that 
wliere the ob.seiaations arc a.'vsumcd to Ijc a samide from a contimiou.s sym¬ 
metrical pofndation of .some .special type* (e.g., normal) but the raliuw of the 
largest few oli.servations make, thia assumption tjue.st intuihh*,. 'rhe, lumpararuetric, 
(c.st.s presented for (a) are ea.sily ap[)Iie<l and a significant re.siilt for a non- 
pavamelric te.st automatically implies that the ob.servations are not a .sample 
from the spceilied type of population. Furthermore, if a parametric! test of thi.s 
ailnation (i.o., a test based on (he assumption of a sample from thi.s .special type 
of population) is significant, (he nonparanictric l(*st.s are useful in determining 
whether it is possible that the observations might lie a sample from a continuous 
symmetrical pojnilation of some other type. 

With respect to (b), perhaps the most common application is that where the 
set of observations can he considered lo be usamiile from a continuous population 
and it is desired lo test whether this population is symmetrical in the tails. 

Now let us consider the forms of the tests. Lei .r(l), • * • , x{n) represent the 
values of the n obsevvalious arranged in increasing order of magnitude. Then 
xin -h 1 -- r), x{n + 2 — r), • ■ • , x{n) are the r largest ol).servatiuns of the 
set. For situations of typo (a), the tests of whether the /• largest observations 
are too largo are of the form 

Test 1, Accept that Iho r larpcsl obscrvalionx are too large lo he consistent with 
the hypothesis that the. populations have a common median if 

min [.T(n -f 1 - i*) + x{jt); I < k < s < r] > 2x(I[''«), 
where the i’s, j’s and n are integers such that 

it ~ T, iu <1 t'u,).! , J„ < J»fl , J« <1 II' a n "f* 1 ” (*, 
a is defined by 

a = Prjmin [a:(n -f 1 ■- 4) 4- a;(jt)] > 2i)!i; 0 = common median}, 
and Wa = )'Fo(n) is the smallest integer satisfying the relation 
(1) PrlTflba) < 1 ()) = common median] < a. 



SOME NONPARAMETRIC TESTS 


585 


In testing the hypothesis of Test 1, the principle followed is to choose 
xin + 1 — r) and some subset of x[n + 2 — r), • • ■ , x(n) for use m the test. 
The integer s repiesents the total number of order statistics selected from 
.r(n + 1 - r), • ■ • , xin). 

The value of a = aCfi, ■ ■ • , ; ji, • • • , j.) is independent of n and is given 

by equation (4) in Section 3. Table 1 contains some values of the i’s, j’s and s 
which yield values of a suitable for significance levels. For Test 1, values of a 
suitable for significance levels can be obtained for r > 4. 


TABLE 1 

Some values of a for s < 6 



If the n independent observations satisfy the additional conditions 

(i) . Asymptotically (n-»«>), x{Wa) is statistically independent of mm 

[a:(n + 1 — 4) + x{jk) \ 1 < k < s], \ i .-c-; V 

(ii) . The standard deviations of xiW„) and mm Wn +1 - »*) . 

(A) 1 <fc<s] exist for all land the limiting ratio (a 

of these standard deviations is either zero or infinite. _ 

(iii) . Let the notation dz) denote the standard deviation 0 2 

populations have a common median asymptoticaUy the cdfs of 
















JOHN K. WAliSH 


f)S0 


and lmin {x(n + 1 -■ 4) + .r(jk)] ~ H]! 
a-{min (.r(n -h 1 4) -f- xO^))! arc contiimouB at the point zero, 

then the significance level of Teat 1 approaches the value a as n tend.s to infinity. 

Although conditions (A) may ajipear to he eompheated, they are not very 
restrictive. 'I’liese conditions are safisfietl if the n olmervations are a sample 
from a continuous population of the type usually encountered in practical 
situations (i.e., aiipro.xirnaUtd in prarfiral situations). Perhaps the most well 
known type of ctmtinuous symmetrical population for which a sample does not 
satisfy conditions (A) is that with a triangular prohalnlity density function. 
Part (ii) of conditions (A) is not satisfiwl for a sample from a population of 
this type. 

For large a, relation (1) with the equality sign is ajjproximalely satisfied if 
W„ = + IKo'Vn, (i c., the largest integer contained in In + ^K„-\/n). 

Here is the standardized normal deviate exceeded with probability a. This 
value for IFa was obtaincsJ from the normal approximation to the binomial 
theorem and furnishes a reasonably accurate solution of (1) with the equality 
sign for n > 10, (see [1)). 

As an example of a test of type 1, let r ~ 5, « = 2, ji = 1, ji ~ 2, 4 = 4, 
t> = 5. Then « ^ .01)47 and the test is (approximately) 

'i'HHT 2. Accept the specified nllernalivc of Test 1 if 

min [x(?i. — 3) + x(l), a:(n - 4) + a:(2)] > 2a:(Jn + ^A'.oMT\/n). 

That this is a lost of whether the 5 largest observations are too large is intuitively 
evident from the fact that a significant result will he obtained only if both 

x{n — 3) > 2x{^n + iK.ouiVn) — a:(l), 

x{n — 4) > 2a;(^n + iK.mV'n) ~ a:(2). 

If the smallest two of the five largest observations are too large, it seems reason¬ 
able to suppose that all of the five are too large A similar interpretation exists 
for all tests of the type of Test 1. 

The type (a) tests of whether the largest obsein-alions are too small are of 
the form 

Test 3. Accept that the r largest observations are loo small to be consistent with 
the hypothesis that the populations have a common median value if 

max [a;(n 1 - j*) + a:(4); l<A<s^r]< 2x{n + 1 ~ Wf), 

where j, = ?•,;„< , 4 < 4,.fi, 4 < n + 1 - IF„ < n + 1 — r, and both a 

and Wc, are defined in Tost 1. 

From the results for Test 1 and symmetry considerations, the significance 
level of test 3 tends to a as n ^ oo if conditions (A) are satisfied; it does not 
e.xceed 2a for any admissible value of n. For Test 3, values of a suitable for 
significance levels can be obtained for all values of r (n sufficiently large). 

As indicated by (2), the tests of whether the largest observations are too lai'ge 



SOME NON parametric TESTS 


587 


can also be interpreted as tests of whether the smallest observations are too 
large, feimilarly the tests of wliether the largest observations are too small can 
also be interpreted as tests of whether the smallest observations are too small. 

llie above discussion presents intuitive reasons for believing that Tests 1 and 3 
are suitable for the situations to which they are applied. To obtain a semi- 
quantitative measure of the suitability of these tests, this paper investigates 
the special case in which the r largest observations are from continuous sym¬ 
metrical populations with common median 6, the remaimng observations are 
from continuous symmetrical populations with common median 0, and each 
population hits the property that the distribution of a: - is independent of yp, 
where x is ari observation from the population and is the median of the popula¬ 
tion. The power function of a test of type 1 or 3 is defined to be the probability 
that the test is significant given the value oi 6 - (j>. It is found that the power 
functions of these testa have several desirable properties: For Test 1, the power 
function tends to zero as5 — (ji—«>,isa monotonically increasing function 
of 0 ~ (/> for 0 — 0 < 0, and tends to unity as 0 — </) —> ». For Test 3, the 
power function tends to zero as 0 — ^ ^ is monotonically decreasing for 

0 — < 0, and tends to unity as 0 — . 

For testing whether the populations are symmetrical in the tails given that 
they are continuous and have a common median, i.e , situation (b), a combination 
of 1 and 3 is used- The resulting test is 
Test 4. Accept that the populations are not symmetrical in the tails if either 

min [tin -f 1 — ik) + x(jk)', 1 < k < s] > 2x(Wa) 


or 


max [x(n + 1 — ja) + x{ik); 1 < k < s] < 2x(n + 1 — Wa), 

where a < ^, u < fu+i, < j«+i, Ju, < C , 7. < Wa < n 1 — , and both a 

and Wc are defined in Test 1. 

Since both inequalities in Test 4 can not be satisfied simultaneously, the 
significance level of Test 4 tends to 2a as n —> “ if conditions (A) are satisfied; 
it never exceeds 4a for any admissible value of n. 

The asymptotic distribution (n —>•«!) of x{W is usually not very sensitive 
to symmetry of the populations. For example, if the n observations are a sample 
from a population with a probability density function f{x) such that (/(4>) 0, 

(0 = population 60% point), andj'(x) exists and is continuous m a neighborhood 
of a; « (A, it can be shown that the only property of f{x) which influences the 
asymptotic distribution of x{Wa) is the value of /(0). Thus, since a type 1 test 
investigates botb whether the largest observations are too large and whether 
the smallest observations are too large (to be consistent with the assumption of 
symmetry), while a type 3 test investigates both whether the largest observations 
are too small and whether the smallest observations are too small. Test 4 should 
be suitable for testing whether a population has symmetrical tails 





JOHN B. WAI-Bll 


3. Theorems and derivations, The fundamental fact, iiml in this paper is 
that, if the observations arc from continuoUH Hymmcirical populations with 
common median d*. the value of 

« e Prjmin f.r(n + 1 - i*) 4- ^O*); I 5a /i: :< s] > 2^1 

“ /V{max (a:(n -}- I “• jk) + jCt*); 1 < fc < s] < 2(^1 

is independent of n for the values of n tyonnittcfl in the tests, This result is a 
special case of the following theorem 

T.'hkokkm 1, Consider a set of n independent observations from continuous 
symmetrical populations with commonmedian^.lAiii < ■■■ <i,andji< <j, 
be fixed sets of integers whose values are independent of n. Then the value of 

Prfdth largest of [x(n + 1 — jh) + 1 < < s] < 2<^1 


is the same for all values of n which arc >i, + jt — 1. In particular 

r\“^ 

a = 


(3) 


where 


ffl(i ) mO) rn(S)—^3 

1 + w(l) + £ ~ W + S S [w(l) —hi~'hf\ + 


m(u) wC 

+ JL 


m(!l)—tu_i 

Z) [m(l) 

A,-I 


hi 


^U-J fj 


w i.+ j, “1, w ■= jt ~ 1, m(Jt + Vr — 1) * i, + j, — it — jt - + 1, 

< = 0, 1, • • ■ , s “ 1, 1 ^ yj < jiH ~ ft, io « Jo “ 1 = 0. 

Pnoop, It is suflFicient to prove the theorem for the expression 

Pr{max [a;(n -t- 1 — jO + *(4): I < k < s] < 2d>), 

since any probability expression of the form Pr[/0th largest of [ ] < 2di) 
can be expressed as a specified constant plus a sum of probabilities of the form 
Pr{max[ ] < 2d>) multiplied by specified constants, where in each case the 
terms in the [ ] are a subset of the s terms: a:(n + 1 — jk) + ^dt), (I ^ k < s). 
Let the integer n have the value no. Then it can be verified that 

Pr(max [a:(no + 1 - ju) + *(4); 1 < it < a] < 2d>) 

(4) = Pr[max (2a;(no — j,), a:[no + 1 — IF] 4* ®[no + 1 ~ IF' - mlfW)]', 

j,] < 2d], 

where 

m(jt + Vt - 1) = no + 2 — 4 — 4 — B<, m(j,) « no — 4 — J« > 4 

« = 0,1, • • • , 8 — 1, 1 < y< < Jh-i “ Ji, 4 = Jo - 1 =■ 0) 

by the use of Theorem 4 of [2], By the proof of Theorem 5 of [2], the value 
of the second term in (4) equals 



f^OMB NONl'AllAMETKlC TESTS 


589 


I>r[m-A\ i2.r(?i„ - j.’l, -)- 2 - ]V] + a:[n,(, + i _ )y _ m{W)], 

l<W<j, + l} < 20] 

>/!(,/« ~t" 1} ~ -1 iiiitl the cj\presaion is based on fto “I" 1 father than tiq observations 
(the values of ilie m s are the same as in (4)). The value of this expression, 
howoi'cr, can lie shown, to equal the value of 

fV[max {2j(rifl + 1 - j,), .T[no + 2 - W] ■+ x[no + 2 ~ W - m(TF)]; 

1 < F < i.) < 20], 

wliifih liy (4) otiuals the value of 

Pr{ma\ [,r(7io + 2 - ji) + a:(4); 1 < fc < s] < 20) 

if n = tin + 1 for this expression. Thus, by induction, the value of 

/b'fmax [.k(?i + 1 - jt) + x(u); 1 < fc < s] < 20) 

is (lia same for all sample sizes n > + js. An analysis similar to that used 

in Iho proof of 'I'licorem 5 of [2] shows that this also holds for w = «. + — 1. 

liquation (3) was obtained by taking n = tit = r, + — 1, the m's as given by 

(4) Avith this t-aluc of v, and substituting into Theorem 4 of [2] 

Another basic result is that, if the observations are from continuous symmetri¬ 
cal populations with common median 0, the value of 

fVimin [T(n 4- 1 - h) -f x(jk), I < k < s] > 2a:(F„)i 

= Prfmax \x{n -b 1 - jk) + 'ciik); 1 < h < s] < 2x[n + 1 - F„)} 

is always less than or equal to 2a, This is a particular application of the theorem 
'I’liKOKHM 2. Consider n independent observations from continuous symmeirical 
populations vnlh common median 0. Then, for any integer TV, 

Pr(max [.c(n 4- 1 — Jk) 4- x(ik)] 1 < k < s] < 2x(T'F)) 

< Pr{max [x(n 4- 1 — Jt) 4- *(4)1 < 20) 4- Pr{x{W) > 0) 

— Pr{max [x(n 4- 1 — Jt) 4- a:(4)] < 20, %[1V) > 0). 

PnooF. 

IT (max [ ] < 2a:(F)} = Pr (max ( ] < 20, xiW) > 0) 

4 - pr (max [ ] < 20, a;(TF) < 0, max [ ] < 2a:(F)) 

4- Pr (max [ ] > 20, x(W) > 0, max [ ] < 2x{W)] 

< Pr(max [ ] < 20, x{W) > 0) 4- Pr{max [ ] < 20, a;(F) < 0) 

4- Pr (max [ ] > 20, a;(TF) > 0) 

= Pr(max [ ] < 20} -b Fr{x{W) ><t>] ~ Pr(niax [ ] < 20, x(TF) > 0). 



590 


JCtllN K. WAUSH 


If the n independent nhw^rvations a^ilisfy cinuliUims (A) in addition to being 
from oonlinuouis eyminetriral popvilidi<tnK with a coininou median value, the 
pinifieance level of Tejjte I and 'A tends to « as n k . '[’his follows from sym¬ 
metry considerations and 

Thkokem 3. L'onMrr n irulrpetirknl ohum'alionn v'htch mtiafy miditions (A) 
and are /row amlimouti mjmvu'lrical jmindritionit xoilh a common m-dian value 
Then 

lim Prlmin [x(n - 4 - l - A) 4 . 1 < I- < s) > 2r(ir„)) = ct. 

Proof. Let 

Y = min [x(n -f 1 -- u) + x(jt); 1 < /: < s] 
and consider the ctise wliere 

If 

lim <rI.c(lF,))/(r(F) - 0 . 

Since the popidationa arc continuous, o-(F) > 0 and 

Pr[Y > 2x(lF„)l - IVir ~ 2^ > 2x(ll'„) - 20 ] 

- pT{[r - 20]/(r(F) > 2 [x(ir„) - 0 ]MF)i. 

Let 

Z « 2(a;(lF,) - 0 ]/<r(F). 

Then, from (i) of conditions (A), 

Pr[F > 2x(IF,)] - f ” Pr{(F - 20]/<r(F) > a] dF.iu) -f li{n), 

where F, is the cdf of Z and lim (3(n.) = 0. 

Let b be any positive number. From lim cr{Z) ~ 0, (ii) of conditions (A), and 

n—*00 

the definition of x(TF„), the meau of Z exists for all values of n and tends to 
zero as n —> <». Then, by Tchebycheff’s Inequality, it can be shown that 

dF,{a) = 1 ~ 7 (n), 

whore lim 7 (n) = 0 . 

n- 4 tiD 

From (iii) of conditions (A) 

lim Pr([F - 20]/<r(F) > - 6 ] « lim Pr([F - 20l/a(F) > l>] + 5 ( 6 ), 

where lim 6 (b) = 0 . 

6-.0 

Using the above relations, letting n —> «> first and then b —>■ 0, it follows from 
Theorem I that 

lim Pr[F > 2x{W.)] = Pr{[F - 20]/(r(F) > 0} = a, 

n-+ao 




f>OMIi N0NPAnA.M3ST]lIC TESTS 


591 


siiTiiItir pc ivhows tlifib this hiniting rslcitioii Silso holds whon 
lim </[a:(H^„)]/o-(F) = 

n-MOQ 

Finally consider properties of the power functions of Tests 1 and 3 for the 
special situatitm outlined in sections 1 and 2. The properties stated in the pre¬ 
ceding two Bections follow from 

TuEonEMd. Itrlj:(n -(- 1 -- . . ^x{n) be from continuous synrn&irical popula- 

iiom with com moil mnlianO, the remaining order statistics from continuous symmet¬ 
rical popukUions with common median it>, and each population have the property 
that the distribution of x ■— is independent of where x is an observation from 
the population and ^ is the median of the population. Also let 

/’i(‘h) «= Pr I min [x(n + 1 — 4) -f x{jk); I < k < s < r] 

> 2x{Wa) 1 0 — (#>=$), 

where the, corulilions for Test 1 are satisfied, and 

« Prlnuix [x(n + 1 - jt) -f x(4); 1 < Ic < s < r] 

< 2xin -f- 1 - TFa) 1 0 - 
where the conditions for Test 3 are satisfied. Then 

lim Pi(^') = 0, lim Pi('I>) = 1, 

lim Ps® = 1, lim Pgf^) = 0, 

Pi(‘h) is a monotonicalhj increasing function of $ /or $ < 0, and P 3($) is a mono- 
ionically decreasing function of '5 for $ < 0 
Pnooy, It is sufficient to prove this theorem for the power function of Test 3. 
The results for Pi(^) can be obtained from symmetry considerations and obvious 
modifications of the proof for Pj($). 

First consider Pa('P) for the case where $ < 0. Let a new set of observations 
bo formed from the ^ven set by subtracting the median value of the corre¬ 
sponding population from each observation. Let y{l), • ■ ■ , y{n) be the values 
of the set of modified observations arranged in increasing order of magnitude. 
Since 4’ < 0, 0 < <#) and 

[x(0 ~4>, \<i<n-T, 

xit) ~ B, n-r-{-i<t<n. 

Thus 

p8($) = Pr (max [y(n -f- 1 - /*) + 1 < * ^ s ^ 

— 2y{n 1 — W A < 



592 


JOHN B. "WALSH 


•whence it follows that ia a monotomcally decreasing function of $ for 
$ 0 and that lim Pti^) = 1. 

4«l 

Now consider the case where > 0. Again form the set of modified observa¬ 
tions and let 2 /( 1 ), ‘ , y(n) be the values of these observations arranged in 

increasing order of magnitude. Then it is easily seen that 

P,(^) < Prly(l) ~ y(n) < 

so that lim Pj(^) 0. 


REFERENCES 

[1] Paxil G Hoel, InlroducUon to Mathetnalical Slalialicii, Jolm Wiley and Sons, 1947, 
p. 45. 

12] John E. Walsh, "Some aignificancc testa (or the median which are valid under very 
general conditions,'’ Annah of Math. Blal., Vol. 20 (liM9), pp. 64-81. 



ON A MEASURE OF DEPENDENCE BETWEEN 
TWO RANDOM VARIABLES 

By Nils Blomqvist 

Univertdty of Stocklwhn and Boston University 

1. Sununflry. The properties of & measure of dependence q' between two 
random variables are studied. It is shown (Sections 3-5) that q' under fairly 
general conditions has an asymptotically normal distribution and provides 
approximate confidence limits for the population analogue of q' A test of inde¬ 
pendence based on q' is non-parametnc (Section 6), and its asymptotic efficiency 
in the normal case is about 41% (Section 7). The g'-distnbution in the case of 
independence is tabulated for sample sizes up to 50. 

2. Introduction and definitions. In drawing conclusions from statistical data 
it frequently happens that it is unnecessary to utilize all the information given 
by the data. In such cases it seems desirable to use methods which are 

1) valid under rather weak assumptions regarding the distribution of the 
population and 

2) easy to deal with in practice. 

Naturally such methods should always be used, but their applicability is, in 
moat cases, limited by their small efficiency. 

Concerning methods of measuring correlation and testing independence some 
so-called rank correlation coefficients have been defined [2, 3, 4, 6] which have 
the first property. In large samples these are, however, rather tiresome to calcu¬ 
late, and a simpler method might then be preferable The coefficient studied 
here has in most cases both properties mentioned above and can be used when¬ 
ever its efficiency is not too small. 

Let (®i ,2/0 ■ ■ • {xn, Vn) be a sample from a two-dimensional population with 
cdf F{x, y), and consider the two sample medians x and y. Thecdf F{x, y) is 
assumed to have continuous marginal cdf's Fi{x) and Fi{y) in order that the 
probability of obtaining two equal a:-values or two equal ^/-values in the sample 
will be zero. Let the », 2 /-plane be divided into four regions by the hnes x = x 
and 2/ = $. It is then clear that some information about the correlation between 
X and y can be obtained from the number of sample points, say ni, belonging 
to the first or third quadrants compared with the number, say nj, belonging 
to the second or fourth quadrants. 

Before going further we shall explain what is meant here by belong to. If 
the sample size n is an even number the calculation of ni and 712 is evident. If, 
however, n is an odd number one or two sample points must fall on the lines 
a; = X and 2 / = S'- In the first case this sample point shall not be counted. In 
the other case one point falls on each of the linea. Then one of the points shall 
be said to belong to the quadrant touched by both points, while the other shall 

593 



594 


Xlia BL0MQVI8T 


not be countwl. It ih pasy to verify that both th and nj by tliifi method will be 
even numluers. 

As a measure of (‘orrelation \vc define 


( 1 ) 


, (It — tit 2?ii 

I S3 tis: 

ftl 4- tl* «| + Hi 


(- 1 <f/ < 1). 


The definition of f/' is not new [51 tmt as far as is known, its statistical proper¬ 
ties have never been studied coinplcttdy. 


3, The asymptotic distribution. It is known [l] that the median in a sample 
from a one-diracnsional distriimtion under certain conditions is a consistent 
estimate of the population median and asymptotically normally distributed. 
Although it seems possible to weaken tlic rcciuircments in our ease, we shall not 
do so. We require that 

a) the population medians are uniquely defined (and assumed to etpml zero), 

b) the marginal distributions of F(x, y) admit density function,s fi{x) and 

My)- 

c) fiiy) and their first derivatives are continuous in some neighbourhood 
of the origin and 

d) /i(0) and/ 2 ( 0 ) are 5^0. 

In order to avoid trivial complications we, shall aasurne here that the sample 
size n =• 2k + 1. 

Now define for every arbitrarily chosen point (x, y) 
a(x, y) - P|£ > X, 11 > y], 
b(x, y) =- P{£ < X, 71 > yl, 
c(x, y) = P|t < X, 71 < y), 
d(x, y) = P{f > X, y < yl, 


where the measure P refers to the cdf F(x, y) and evidently 

a + b + c4-d = l. 

As the number of sample points belonging to the first and third quadrants 
around (£, §) must be equal, the probability of the combined event 

(ni = 2r; x«(x, x -f- dx), ^t(y, y -b dy) | 
is 

(3) v,&r,x.y) - 

where 


V 

(4) S — - ■ d,a - dya — 
a 



• dxb ' dyh 



dx C ' dy C “* 


k — r 


dxd-dyd 4 - dF. 


d 



measure of dependence ggg 

Eacli (»f the first four terms of the exDrPssir.71 ^ 

Bimplo [Mints determine (i, jj) and the l,«f t. L j'®'® *» s case in which two 
is determined by only one point From ftl v f^u oeeasemwhich (5, J) 
obtaining a, at 1st ^,„“uo 2K is »' 

«o*'—t>o r—O 

K we introduce the joint edf ».(w. y) of * mrd y, (5) can be wntten 


( 6 ) 


as 


P(ni < 2fJ} = r r di'ki^, y) 

•'-eo*'—oo - 


Pk{2r; X, y) 


IZ P4(2r, X, y) 


k 

2/) = X! Pt(2r; a;, y). 


Clearly tlui integrand ia (G) is <1 everywhere it exists. In the points (z, y) 
where the denominator is equal to zero the integrand is undefined, but as the 
measure (^') of the set of such points is zero, we need not have any trouble 
with them. 

Under the conditions a)--d) 2 and § converge in probability to zero; that ia 


lim %*(a:, y) 

Jc^OO 


1 for {a: > 0,1/ > Oj, 
0 otherwise. 


Thus, when k and f? tend to infinity such that y —> const, (6) becomes 


IZ pt(2r;0, 0) 

(7) lim P{ni < 2K} = lim - . 

Z P*(2r, 0, 0) 

r-0 

According to (3) 

(8) p*(2r; 0, 0) = • (aoco)^• , 

where the subscripts indicate the value at the point (0, 0). Because of (2), 
Co — flo, do ^ &o and oo -t- — oi 

and the two parts of (8) are for large fc 


(2fc + I)! 2r L2a-r) ^ _^-C(r’-2fcao)®/4ftao6o) 

— 7*)P ^ ^ 27rao6o'\/2'7rfc 



596 


JfJLS B1.0MQV18T 


and 

s, 


f aa\ // 5b\ /£b\ _j_ /^\ //3(i\ 
\dx/o\di//o \dxj<\<iy/{i \9r/o\f^J//o \dx/ii\t)yj<t_ 


dx dy. 


The first of these expressions follows from the usual application of Stirling’s 
approximation formula and we omit all details here. 

Hence, after the introduction of 

r =» 2kaa + t-\/2kaJ}c, 

R »“ 2kac + T-y/'Zkoobii , 


the expression (7) is transformed to 


(9) 


lim P 


ni 


ikoo 


•\/ S/caolM 


< rj 


1 

■\/2ir 


r e-*“ dt. 

kL .|0 


From (9) it follows that ni is asymptotically normally distributed with mean 
4fea<) and standard deviation \/§/caobo. Thus 


is a symptotically normally distributed with mean jlOo — 1 and standard deviation 
2\/ao(i ~ 2at)/k. 


4. Properties as an estimator. Suppose we measure the correlation between 
X and y by 

(10) q^2\ f f dF + [ r dF 

L,J—( 0 * 1 —eo 


J — 1 » 4ao 1, 


where, as before, (0, 0) are the coordinates of the population medians. Then q 
has the desired property of being equal to zero in the case of independence and 
equal to ±1 in the case of linear relationship between x and y. 

According to (9) q' is a consistent estimate of q when the conditions a)-d) are 
fulfilled. Furthermore, as the standard deviation of q' is, to a first approximation, 
independent of quantities other than q, it is possible to construct approximate 
confidence limits for q for large sample sizes. This is done in the following way. 
In terms of n and q we have, according to the last paragraph of section 3 and 
( 10 ), 

Let 4'(x) be a standardized normal odf and Xi and >2 two numbers such that 



MEASX3BE OE DEPENDENCE 


597 


$(Xj) — 4>(Xi) =• 1 ~ a. According to (9) we then have 
(11) •P 1^1 < • Vn < Xjj ~ 1 — a, 

which gives the desired result. 

If we let \i = -Xi * X and solve the inequality in (11) for q, the following 
aymmelrical confidence interval is obtained 

q' V^+ n(l ~ g'*) < 5 < 5' + ^ Vx^ + n(l - g'»). 
where we have used that X* <5C n. 


6. The nonnal case. If z and y are normally distributed with correlation 
cocfifioient p, we have 


( 12 ) 


2 

g = - arcsm p. 


This expression is the same as the mean of Esscher-Kendall’s rank correlation 
coefficient t [2,4], Hence, in the normal case q' and t estimate the same quantity. 
The coefficient q' has, however, a much smaller efficiency. The asymptotic 
efficiency of q' relative to the afore mentioned coefficient is 


<r’(3') 



4 

9 


for p » 0. 

6. Tests of independence based on q'. In testing independence between x 
and y it is in practice more convenient to use critical regions based on ni instead 
of q'. Since, under the null hypothesis, the measure of a critical region is inde¬ 
pendent of Fix, y) (Fiix) and Ftiy) are assumed to be continuous), any test 
based on n\ is non-parametric. We have made exact calculations of the g'-distribu- 
tion for sample sizes n up to 50. For larger sample sizes the nonnal approximation 
for ni does not seem to entail errors of practical importance. 

To derive the exact distribution of th under the null hypothesis we suppose 
that n equals 2k. The probability that any k sample points shaU have smaller 
a-values than the other k points is 

(*)"■ 


Hence, since any arrangement of the sample points according to their ^-values 
does not affect the distribution of the y-values, 

/ifcV 


P{ni - 2ri = &• 

(“) 


( 13 ) 



S98 


NIW HLOMtlVLST 


If ft » 2/c 4- i i( ifi ('asily voriliwl that tlie iinthahility (13) rcmuiriH Unchanged, 
if we use the procwlure in caleulaihig 711 anti n-i prupitHal in Sertion 2. 'I’his is, 
in fact, the main reafon fnr thr> prtiptisal. 


T.iWf ((/J'!! «I ^ 4 


2* ’ 

4 

% 

11 



24 

10 

M 

M 

40 

4t 

48 

C 

1 Wl 

l.WiO 

1 m\ 

! 0481 

I mi 

1 mi 

1 000 

l.(»8l 

1 m 

l.0« 

1.000 

1 000 

% 

an 

.440 

m; 

.019 

,KW 

.OKI 

.TIM 

.724 

.740 

.742 

.704 

.773 

4 


.039 

»«o 

15J 

J70 

220 

asr 


.314 

343 

,3M 

.387 

e 



(M2 

,m(t 

.(♦23 

.039 

057 

.fjr« 


113 

131 

148 

8 i 




.(KM 

.0011 

0033 

007(1 

.ai2 

tllK 

.IM6 

.031 

.013 

10 






0001 

.0004 

.(Hill 

0022 

.0B3S 

.OOM 

.0087 

13 









.(Kxn 

.own 

.0007 

0013 

14 1 











.0001 

.0001 














Ik 

¥ 

8 

10 

U 

18 

32 

u 

■W 

34 

u 

42 

48 

80 

1 

1 000 

1.000 

i.ono 

I 000 

1 000 

l.OOit 

1.000 

1.000 

t.oon 

1 000 

l.OOO 

1.000 

•i 

,100 

3tW 

a HO 

347 

498 

.414 

400 

.494 

.817 

MH 

S5B 

,672 

6 


0079 

039 

osr 

.08(1 

.11.0 

143 


.194 

.317 

.MS 

.3.18 

7 



.0000 

.mi 

.00X9 

017 

,027 

.(US 

.OSO 

.083 

.070 

089 

9 





0003 

.0013 

fKt2K 

.(KIM 

.(KWO 

013 

.017 

.023 

n 







• OIKIl 

0004 

onoo 

.0017 

,0028 

.0042 

la 









.(KK)l 

.0001 

,0003 

.(HXW 

18 














ik 1« Iho iKTtwt «V0D nutnbijr cantnmwt m th« Minplo aiio 


The distribution of ni is symmetric about nj = k with the variuneo 



Thus, in testing independence wo can for large sample sizes use 


fti ~ k 

Vk 


■ •s/2fc - 1 


as an approximately normally distributed random variable with mean zero 
and unit s.d. 


7 . The asymptotic efficiency of the ff'-test. In the case that a: and y are nor¬ 
mally distributed with the correlation coefficient p, it is possible, l)ut rather 
tedious, to calculate the power function of the {'-test. Wo will, therefore, restrict 
ourselves to considering only the asymptotic beliavior of the power function. 

Consider tests of independence (p = 0) against one-sided alternatives p > 0. 
Let Lm\p) be the power function of the g'-test for the sample size m and Ln\p) 
be the power function of the test based on the correlation coefficient r in a 
sample of size n We assume that all tests have the same size, i,e. 

(14) L"’(0) = Li‘>(0) = « 



MEASURE OP DEPENDENCE 


599 


for all m and 7i. We shall say that the g'-test has the asymptotic efficiency e if 



/aL“»\ 

(15) 

lim ^ 


\ dp /!>• 

when 

n 


m = -. 
e 


This means that the sample size in using the r-test need only be 100e% of 
that in using the g'-test, in order to get the same derivative of the power functions 
at p = 0 (for large sample sizes). Since the definition of e only concerns the 
behavior in the noighboihood of p = 0, it might perhaps be more correct to call e 
the asymptotic local efficiency 

In order to calculate « we define two sequences and {r„j such that 
{q' > gm} and {r > r„) are tests with the afore mentioned properties. According 
to (9) and (10) q' is asymptotically normally distributed with mean q and s.d. 

Furthermore, r is asymptotically normally distributed with mean 
p and s,d. (1 — p)l^n. Hence, 


1 


= PW < g™ 1 P} ~ ^ 


Q’n- Q 

Vl - 2 



1 - Ln\p) = P{r < rn\ p] 






from which it follows 


(16) 


(^'). ~ (I). 

^ $'(r„ ••%/«)'Vw- 

\ dp h 


According to (14) we have 

lim qm'-Vm = limr„-\/n = "^(l — a). 

m—♦» n-*oo 


Thus we conclude 



/dq\ 


Clearly (17) is equal to 1 if 


n 


2 



600 


NlUi m.OMQVWT 


Hence, according to (12) and (10) 



In other words, the asymptotic efficiency of the i 7 '-test ia atjout 41%. 

8. Concluding remarks. An intere-nting similarity exists between the g'-test 
of independence and a test of cciual location parameter's in two distributions, 
constructed in the following way. Suppose that two samples of equal size, say k, 
are drawn independently from two diatrilrulions. Compute Lire number of 
individuals, say r, in the first sample, falling short of the median of the pooled 
samples. Then the distribution of 2r under the null hypothesis is the same as 
that of ni in the g'-tesb for sample size 2k (or 2k 4- 1). The test based on r was 
discuesed by F. Mosteller [7]. 

Another similarity is between the g'-test and a special case of the exact test of 
independence in a 2 x 2 table [8]. If in such a table the marginals happen to be cut 
at the 60% points the two lest procedures become identical. 

IIEFKRENGE8 

Ul II. Mallutmalical .l/ct/iorf* of Slalkbcs, Princpton UoivcrBil.v' ProBs, 1946. 

12] F. Esbchbk, "On a motliod of delermiamg correlation from the ranks of a variate", 

SkandirnuUk AkluarieUd»krift, Vol.7 (lyW), p, 201, 

13] W. lIoEFiTBiNQ, "A non-paranietric test ot inde|)Ci>donce’’, /Innols of Malh. Slat,, 

Vol. 19 (1948), p. 540 

14] M. Q. Kenoauc, "A new meeBaie of r.mk corrcinlion", Biometrika, Vol. 30 (1038), p.81. 
16] F. MosTEtOBB, "On some useful 'imdficieiit’ statistics", Annals of Malh. Slat,, Vol. 17 

(1946), p. 377. 

16] C. Spbabman, "The proof and measurement of associatiou between two things". Am. 
Jour, of Psych., Vol. 16 (1904), p. 88. 

[7] F. Mostjsuler, "On some useful ‘inefRoient’ statistics", unpublisVied thesis, Princeton 

University, 1946. 

[8] R. A. Fisher, Slalistical Methods for Research Workers, 8th Ed, Stechert & Co,, 1941. 



SOME TWO SAMPLE TESTS 

By Douglas G. Ohapmani 

TJniver&ity of Washington 

1. Introduction and summary. Stein [4] has exhibited a double sampling pro¬ 
cedure to test hypotheses concerning the mean of normal variables with power 
independent of the unknown variances. This procedure is here adapted to test 
hypotheses concerning the ratio of means of two normal populations, also with 
power independent of the unknown variances. The use of a two sample procedure 
in a regression problem is also considered. 

Let (X,,) {i = 1, 2) (j = 1, 2, 3, —) be independent random variables 
distributed according to AT(m^, cf }; all parameters are assumed to be unknown. 

Defining k by the equation 

(1) mi = knii 

we wish to test the h 3 'pothesis H that h has a specified value ko •, 

If Ao = 1 the hypothesis H reduces to a classical problem, often referred to 
in the literature as the Behrens-Fisher-problem (cf. Scheff^ [3] for a bibliography). 
At the present time it is still an open question whether it is possible (or desirable) 
to find a non-trivial single sample test for H with the size of the critical region 
independent of cri and . In any case it is a simple extension of the result of 
Dantzig [1] (of. also Stein [4]) to show that no non-trivial single sample test 
exists whose pomr is independent of <ri and a-j. 

On the other hand the case fco 1 may be expected to occur frequently in 
fields of application where a choice must be made between different products, 
methods of experimentation etc. which involve different costs. The statistician 
must make a choice on the basis of results relative to the ratio of costs involved. 
Nevertheless this problem appears to have received little attention in the 
literature. 

In general tests based on a two-sample procedure may not be as “efficient" 
in the sense of Wald [5] as a strict sequential procedure. On the other hand the 
two sample procedure reduces the number of decisions to be made by the experi¬ 
menter and it will, in certain fields, simplify the experimental procedure. 

2. The two sample procedure. Stein’s double sampling procedure (which may 
be denoted procedure S) to test a hypothesis concerning the mean of a normal 
population consists briefly in the following steps: 

(a) Choose “a priori” a positive number z and a preliminary sample size n. 

(b) Take n independent observations xi, • • • , Xn oi the random variable X 


'This lesearch was carried out while the author uus at tlie ITnivcrsitv of California. 
Berkeley, and was supported in part by the Office of Nnviil Research. 

601 



m 


DurOLAK K. t’HAI‘M\X 


which is assumed to be difitributed according to N(m, ct^) with unknown mean m 
and unknown variance tr’, and calculate 


( 2 ) 


£ {x, ~ x), 


n 


(c,) Let N = J + 1, a + 1 j wlicre [r] ==- largest integer < r 

(d) Take N ~ n more independent observations of X and choose a set of 
constants ai, ■ • • a.v such that 

(3) (i) 23 1, (ii) — aa =« ■ ■ • === , (iii) 23 = 71 • 

»«! iwl 

.V 

X) ttiX, — ni 

(e) Then — has Student’s t-di.stribution with n — 1 degrees of 

z 

freedom. 

SteiJi further show'ed that the procedure may be modified to some advantage 
in problems dealing with a single population. Tfii.s modification is not applicable 
in the problems under consideration here. 

There remains to fie discussed briefly the choice of n, z and the a’s. The pre¬ 
liminary sample size n may be clcterminecl by other considerations or it may be 
chosen as part, of the design of the experiment. Hodges [2] has shown that the 
expected value of the total sample size N and the power of the test both depend 
on the choice of n and he has disouased the optimum choice of n w'ith respect 
to the modified procedure of Stein. In general this optimum choice of n depends 
upon prior knowledge concerning the variance. 

The power of the test will depend upon z; some considerations concerning 
the choice of z will be dealt with after discussing the tables upon which the 
two sample tests are based. 

The arbitrariness involved in choosing the a’s may be eliminated by placing 
the additional requirement that 


(4) 


ttn+i = ctn+j = • • • = Ow “ 6 (say). 


Letting 01=02 = 
viz., 


(5) 


On = a it IB elementary to solve for a and h explicitly 
no + (JV — n)h - 1, 


The solutions are 
( 6 ) 

(7) 


7 io“ + iN ~ n)h^ = ~. 

Xi 


h = - (l 4- . - u^} ] 

V ^ T (X - ■n)u‘ J ’ 

1 - (N - n)b 


a = 


n 



TWO SAMPLE TESTS 


603 


3. Test for H The steps involved in testing the hypothesis H are 


(« 5 - 

22 

(b) Cany out procedure S with the same 
two statistics Ti, r 2 j i e 


n for each population, determining 


(9) 


^13 

rp _ j=l 

V2~ 


a = 1,2). 


Then Ti - Ta has, under the hypothesis tested, the distribution of the difference 
of two independent Student variables. 

If s denotes the difference of two independent random variables ti and 
each distributed according to Student’s ^-distribution with n. — 1 degrees of 
freedom and if so is defined by the equation 


P(1 s 1 > So) = a, 

then a test of size a is given by the rule: H is rejected if | Ti - 721 > So 


4. The distribution of differences of Student variables. The distribution of s 
is easily found by the method of characteristic functions, in case n is even. 
Let m = n ~ 1 and to simplify slightly put 


( 10 ) 


2/. 


U_ 
a/ m 


Then the density function of p, is 


( 11 ) 


f{y) = 



1 

(1 -t- 


and its characteristic function 


(f =1,2). 


( 12 ) 


<py{t) = r e"‘^Siv)dv 

J-‘9Q 


(13) 



Ttt— T)/2 


(rn— 

£ 


(- 


m — 1 


m' 


'm — 


+ r) I 



[2 (lfl)] 


(ra-l)/2- 


Formula (13) may be obtained by contour integration, it is, however, a standard 
formula in connection with Bessel functions of the second kind of purely imagi¬ 
nary argument (cf. Watson [6], pp. 80, 185-188). 



004 


OCJf01*AH O. CHAPMAN 


While it is not prmsible to obtain a simple general expression for 


( 14 ) 


/(«') 


« TT w—>60 


the density function of le » this integral may he evaluated for m » 1, 3 5 

V wi, ’ ’ 

etc. and furthermore the density function of s may be integrated in a closed form 

for such values of m, and eonseriuently tabulated fairly easily. 

In ease n is odd it is pcrasible to express ^>5,(0 in terms of Bessel functions but 

the Bessel functions obtained are not expreesible in a closed form. While the 

problem may be attacked directly by numerical integration, it will generally be 

sufBcient to interpolate in Table I where necessary, for such values of n. 

Table I gives the distribution of s for a « 2, 4, 6. 8, 10, 12. For larger values 

of n it may be sufficiently accurate to use the normal approximation to the 

distribution of «. In virtue of the asymptotic normality of the t-distribution a 

will be distributed approximately normally with mean zero and variance 

for n sufficiently large. 


n — 3 


B. Power of the test. Writing 

( 15 ) and 
jt is seen that T ” s + A and hence 

(16) P(.H is rejected) « P(1 T 1 > so) - P{b < —so ~ A) + P(s > so - A). 
Since 



equation (16) may be used as a guide in choosing z* so that a certain minimum 
power is attained; the presence of the nuisance parameter m>i makes impossible 
the determination of Zj so as to give exactly some preassigned power. 

Since a is distributed independently of <fi, tri, it follows that the power of the 
test is independent of these parameters. Using the addition formula to express 
the frequency function of a in terms of the frequency function of Students’ 
t-distribution, it may be shown that /(a) in unimodal and symmetrical about 
8 = 0. Hence the test is unbiased. It also follows from (16) that if zj is made to 
approach zero the probability of rejecting H when it is false tends to 1: i.e. 
the test is consistent. 

It may be observed that tests for the one-sided hypotheses 


rrii ^ , 

— > K or 

mj 



fc 



TWO SAMPLE TESTS 


605 


may easily be formulated. Table II provides a table useful for such tests also, 
at half the indicated sigmficance levels. 


table I 

Bistnbution of a: difference of two independent student-variables with n - 1 degrees of freedom 

The value tabled is P(0^s^So) 



The value tabled is s» 


\ 

SlgnlBcanco Level 

2 

4 

6 

i 8 

10 

12 

Nonnal 
Approxi' 
matioQ for 

A - 12 

P(1 8 1 ^ «.) “ .05 

P(i 8 1 S Sc) - .01 

25 41 
127,3 

10.82 

36.8 

3.62 

5 38 

3.34 

4 72 

3 18 

4 42 

3 10 

4 26 

3 06 

4 03 


6 . A regression problem. We consider the problem where x, are values of a 
sure variable, Yi are independent random variables with 
( 17 ) S{Yi) = a + bx, 

and <Tr^ is unknown. It is desired to estimate a and b and to test the hypothesis 
b = bo. 












606 


D0l:Gl>A8 G. CHAPMAN 


Tlieusual procedure is to iisaurne <ry, constant, and use the Markov theorem 
(i.e. the standard least scpiarea formulao). lu this way unbiased estimates of 
a and i) are obtained, whether or not this assumption is fulfilled. However the 
usual significance test for h is not valid if this tissumption (plus normality of 
the F’s) is not fulfilleti. 

The two sample procedure letwla to n valid lest of the hypothesis b = bo, with 
power independent of the unknown variance. Since linearity of the expected 
value of F on x is assumed, the optimum procedure is to observe Y for only two 
values of x, at opposite ends of the range. Ltd these points be Xi, rcj. For these 
values of x, procedure may be used (choosing Si = zt) to determine 2\ , Tj 
where IT, — (a -t- bxi)f-\/z has Student’s /-distribution with — 1 degrees of 
freedom. 

Then the following estimates of a, b are unbiased, for n > 3, 

‘ - S^') 

a„, 

To test the hypothesis lliib <= h it is necessary only to calculate the statistic 
r [(Ti - Tj) ~ — Xi)]/^z and reject Hi , at tlio a level of sig¬ 

nificance if I f I > So > where So was defined above (Section 3). 

It is seen that if b' is the true value of b, then the power of tire test is a function 
of (b' — ba)(xi — X 2)/\/2 S'Gd z maybe determined to obtain any prescribed power 
desired. Ij; is also immediate that the power of the test is independent of xr,. 

The author wishes to express thanks to the members of the computing staff 
of the Statistical Laboratory, University of California, Mrs. E. Putz, Miss J, 
Linton, and Mr. J. Blum, for assistance in preparing Tables I and 11.® 

REFERENCES 

[1] Geobqb B. Dantziq, “On the non-existence of teats of 'Student’s' hyijothesis having 

power functions independent of v,” Annals of Math. Slat., Vol. 11 (1940), p. 186. 

[2] Joseph L, Hodobs, Jn., “The selection of initial sample size in the Stein two sample 

procedure”, unpublished dissortatiuu, Uiuversity of California, Berkeley, 1948. 

[3] IIenhy ScHBFEfii, "On solutions of the Behrons-Pisher Problem based on the /-distribu¬ 

tion", Annals of Math. Slal , Vol. 14 (1913), p. 36. 

[4] CiiAiiLES Stkin, "A two sample tost for a linear hypothesis whoso power is independent 

of the variance”, Annals of Math. Slat., Vol. 16 (1945), p. 243. 

[6] Abkaiiam Wald, Sequential Analysis, John Wiley and Sons, Ine., 1947. 

[6] G. N. Watson, A Treatise on the Theory of Bessel Functions, Cambridge University 
Press, 1944. 


* It has been pointed out to the writer that percent points of linear combinations of 
two independent Student t’a are given in Table VI (by P. V. Sukatme) in R. A Fibhbb 
AND F, Yates, Statistical Tables for Biological, Medical and Agricultural Research, Oliver 
and Boyd, Edinburgh, 1943 (added in page proof). 



NOTES 

This section is devoted to brief research and expository articles and other short items. 


TRANSFORMATIONS RELATED TO THE ANGULAR AND 
THE SQUARE ROOT 

By Mtjbray F. Feeisman and John W. Tukeyi 
Princeton University 

1 . Summary. The use of tranaformationa to stabilize the variance of binomial 
or Poisson data is familiar (Anscombe [1], Bartlett [2, 3], Curtiss [4], Eisenhart 
[5]), The comparison of transformed binomial or Poisson data with percentage 
points of the normal distribution to make approximate significance tests or 
to set approximate confidence intervals is less familiar. Hosteller and Tukey [6] 
have recently made a graphical application of a transformation related to the 
square-root transformation for such purposes, where the use of “binomial 
probability paper” avoids all computation We report here on an empirical study 
of a number of approximations, some intended for significance and confidence 
work and others for variance stabilization. 

For significance testing and the setting of confidence limits, we should lilce 
to use the normal deviate K exceeded with the same probability as the number of 
successes x from n in a binomial distribution with expectation np, which is 
defined by 

— r dt = Prob (a; < fc 1 binomial, n, p). 

2ir J-oj 

The most useful approximations to K that we can propose here a.re N (very 
simple), iV^ (accurate near the usual percentage points), and JV** (quite accurate 

generally), where __ 

AT = 2 (V(fc + !)<? - V(n - fc)p)' 

(This is the approximation used with binomial probability paper.) 

^ N + 2p - 1 ^ ^ lesser of np and ng, 

^ ^ + 12 ’ 

(JV _ 2)(N -b 2) / 1 _ _ . 

-g- Vng-bl/’ 

.... ,ra, , iV * + 2p — 1 ^ _ lesser of np and ag. 

^ ~ ^ 12 \/E ‘ I 

For variance stabilization, the averaged angular transformation 

sill"' /|/ ^/^\ 

~1 Prepared in conTectloa with research sponsored by the Office of Naval Research. 

007 



008 


Mt’HRAY F. FIIKKMAK AND JOHN W. TI’KBY 


has variance witliin dzG% of 
I 


(angles in tadians), 


821 

n -r f 71 + f 

for almost all oas^ where njj > 1. 

In the Poisson case, this simplifies to using 

■V'x + \/x+ 1 

as having variance 1. 


(angles in degrees), 


2. Significance testing. In addition to tlie approximations mentioned above, 
empirical study was also made of the following 

X ~ np 
V npq ‘ 

L* '=‘ L modified by a term iiko that in N* , 
ilf “ 2 Vn'+l ^sin"' /|/"^ “ s’a'’ , 

M'* “ M modified by a term like that in N*. 

Taking an upper limit of 2.6 or 3.5 on j iC | and a lower limit of 0.01, 1, or 4 
on np, the greatest observed errors of the approximations were smallest for 
N**, and M* and largest for the direct approximations L and L*. This 
was true for all sbe choices of region. 

If we exclude the cases A: » 0 and fc =« n, where the desired probability can be 
calculated directly, the largest observed errors in the substantial number of 
oases computed, which are probably representative of the regions where the 
approximations are worst, were as follows; 


IK| 

E •* np 

V** 

M* 

»• 

Uu-gest obMTved error of 

m N M 


t 

^2,6 

£4 

.04 

.07 

.08 

.14 


.17 

.26 

.35 



.04 

.09 

.13 

.19 

.20 

.24 

.36 

,42 


^0.01 

,04 

.20 

,20 

.19 


.66 

.62 

.80 

<3.6 

S4 

.08 

.07 

.08 

.10 

.26 



.63 


i.1 

,U 

,10 

.17 

.21 

.38 


BKil 

1,26 


SO.Ol 

.11 

,51 

.60 

,21 

.06 


6.88 

3.42 


Within the range of great interest, | X | < 2.6, that is .0062 < probability 
< .9938, we have errors of less than 0.04 in N** and less than 0.20 in N, 

For 1.6 < I X 1 < 2.5, the range of greatest interest, the average error of 
was less than 0.03 and the maximum was 0.08 (54 oases considered). 






TRANSFORMATIONS 


609 


Thus, we fan rffommentl 

A' UK a siRiiilft and usually accurate transformation, 

N ' for rapid signrficanco testing, 

X** fur adtHpiale accuracy at all levels. 

Figure I shows (ito holiavior of the various approximations in the case n = 50, 
np « 5. Thin roughly typical. 



Fio, 1. Errors of approximation. 


3. Variance stabilization. The various suggestions for stabilizing the variance 
of the Poisson are: 

V’»~+"iA> (Bartlett [2]), 

VaT3^. (Anscombe [1]), 


Vi + 

Figure 2 shows the variance oM^ theTest ti small expectations 

Poisson expectation. Clearly Va: "h v» + be read from a square-root 

are to be considered. The simplicity vnth which it can be read from squ 

table, and its unit variance, are also favora e “ • ^ 

When an approximation of a given form is to work over as 









010 


Mt'IlHAY F. FKKKMAN AND JOltN W. TUKEY 


possible -without the magnitude of its erroru exceeding a certain limit, the opti¬ 
mum approximation is almost certain to involve errors of both signs. If ±0% 
variation in variance i.s permiHsahle, \/r + ■\/x +~l is usable for expectations 
of unity or more. It is not aurinlsing that An.sctimbe’H approximation, obtained 
by eliminating the term in n“\ and tlominated by the term in rT^, should only 
meet the d:ti% tolerance for expectations of 2.2 <)r more. 



4. Scope. Values of K, and with aomo occasional exceptions, of L, L*, 

M* , N, N* and N** were calculated for 

n = 2, 5, 10, 20, 100, 

P = 1%, 2%, 5%, 10%, 20%, 30%, 40%, 60%, 
k giving K < 4.6, 

and similar computations were made for the Poisson case -with expectations 
1 /100, 1/50, 1/20, 1/10, 1/5, 1/2, 1, 2, 4, 8, 16, 32, 64. 



R13MAEK 


611 


These computations were made to only two decimal places, so that the final 
results may easily err by 1, 2, or 3 in the second decimal place. 

A more complete discussion of the problem, the origin of the approximations, 
and tables showing a representative collection of actual values can be found in 
Memorandum Report 24 of the Statistical Research Group, Princeton Univer¬ 
sity, which bears the same title as this note. Copies may be obtained from its 
Secretary, Box 708, Princeton, N. J. 

REFERENCES 

[1] F J Anscombe, “The transformation of Poisson, binomial, and negative binomial 

data", Biometnka, Vol 35 (1948), pp 246-254 

[2] M S. Bautlett, “The square root transformation in the analysis of variance”. Jour 

Roy. Slat. Soc., SuppL, Vol. 3 (1936), pp 68-78 

[3] M. S. Baktlett, “The use of transformations”, Biomeincs, Vol. 3 (1947), pp 39-51. 

[4] J. II. CunTiSB, "On tiansfoimations used in the analysis of variance”. Annals oj Math. 

Slaf., Vol. 14 (1943), pp. 107-122 

[5] CllunciiiLL Eiseniiaht, “The assumptions underlying the analysis of variance”. 

Biometrics, Vol. 3 (1947), pp 1-21. 

[0] FnuDEMCK Mosteelbh and John W, Tukey, "The uses and usefulness of binomial 
probability paper". Jour. Ato Stal Assn , Vol 44 (1949), pp 174-212 


REMARK ON THE ARTICLE “ON A CLASS OF DISTRIBUTIONS THAT 
APPROACH THE NORMAL DISTRIBUTION FUNCTION” BY 
GEORGE B. DANTZIGi 

By T. N. E. Ghevillb 
Federal Security Agency 

In this interesting and valuable article, Dr. Dantzig showed that, under 
certain conditions, a sequence of frequency distributions connected by a linear 
recurrence formula converges to the normal distribution. Among several applica¬ 
tions of his results which are discussed, the author mentions their relation o 
certain types of smoothing formulas, and has shown that if a linear smootlnng 
formula and the data to which it is applied satisfy certain conditions, the iteration 
of the smoothing process produces a sequence of smoothed distributions which, 
upon normalization, approaches the normal frequency curve. 

In a summary pa agraph at the end of the article, it is stated that “successive 
applicaL of one or many such linear formulas will usually smooth any set of 
vahies to the normal curve of error ” The entire article was concerned mth 
frequency distributions, and a careful reading makes it clear that the author 

I Annals 0} Math''Siat., Vol. 10 (1939), pp. 247-253. 



T. N. B. GRBVILBE 


(il'i 

restrictions imposed on i)Oth the original data and the smoothing formula as 
they are stated only by implication, and not explicitly, even though they have 
the effect of excluding important classes of smoothing formulas, such as those 
commonly employed by aetunrieH. 

The approach to the normal distribution is shown to depend on the vanishing 
of a certain limit denoted as F' which is a function of the moments of the original 
data and of a distribution in which the weights employed in the smoothing 
formula are interpreted as frequencies. At this point, objection may be taken 
to Dr. Dantzig’s proof, since the smoothing formulas most frequently used 
contain negative weights. However, it has been shown elsewhere* that the 
occurrence of negative weights will not of itself prevent the seciuencc of smoothed 
distributions from approaching the normal curve. A somewhat more serious 
difficulty arisea if, as is commonly the case, the smoothing formula has the 
property of reproducing polynomials of a specified degree. If the degree repro¬ 
duced is two or more, this implies the vanishing of the second moment of the 
weight distribution, in which case the limit F' docs not vanish. In fact, it has 
been shown by DeForeat’ and Schoenberg that the iteration of smoothing 
formulas which repr«luce polynomials of higher degree gives rise to a sequence 
of limiting distributions which have the general appearance of the normal curve 
in the center portion and of a damped sine curve in the tails. Tliis is, however, at 
best, a technical exception to Dantzlg’s statemont, as one is still faced with Ids 
basic proposition that repeated application of a smoothing formula to a frequency 
distribution will cause tl\e smoothed distribution to bo dominated by the char¬ 
acteristics of the smoothing formula rather than those of the original data. 

While he did not intend the statement to refer to data not in the form of a 
frequency distribution, some readers seem to have interpreted it as being of 
general application, and, for that reason, I should like to point out a few of the 
considerations involved in applying iterated smoothing to other types of data, 
such as, for example, a time series or the values of a mathematical function. 
The limit F', on whose vanishing Dantzig’s theorem depends, involves the 
second and fourth moments of the original data (os well as of the weight dis¬ 
tribution) and, therefore, can be computed only if these moments exist. For 
this it is necessary (but, of course, not sufficient) that the function being smoothed 
shall tend toward zero as the independent variable approaches positive or 
negative infinity, 

In order to iterate a smoothing formula an infinite number of times, it is 
obviously necessary to have an infinite set of original values. Therefore, in 
smoothing, for example, a finite lime series, one would have to make some 
assumption regarding the values of the series outside the range for which they 

’I J. Schobnbbbg, "Some analytical aspects of the problem of smoothing,” Couranl 
Anniversary Volume, Interscience Publishers, New York, 1948. 

•H, H WoorDNUEN, "On the development of formulae for graduation by linear com¬ 
pounding, with special reference to the work of Erastus L DeForest," Trans. Actuarial 
Soc. Am., Vol. 26 (1925), pp. 81-121 



EEMABK 


613 


are actually available. Of course, if it were assumed that the values were zero 
outside this range, Dantzig’s theorem would apply. However, under this assump¬ 
tion, infinite iteration of a smoothing formula would not be a rational procedure, 
as it would smooth each value to zero, and the incidental fact that the sequence 
of smoothed distributions, while approaching zero, also approach the form of a 
normal distribution, would not be a very valuable one. In this connection, an 
important distinction between time series and frequency data is that, in dealing 
with the former, one is interested in the magnitude of individual values as well 
as in the general form and shape of the distnbution. In practice it might be 
preferable not to make any assumption about the values outside the given 
range but rather to employ special devices to obtain smoothed values near the 
ends of this range. In such a case, the smoothmg process would be a function 
of the range (if not of the actual values) of the original data distribution. Such a 
process was not considered by Dantzig, and is clearly excluded by his definition 
of a linear smoothing formula, which requires thatr the formula be completely 
independent of the data to which it is applied. 

The somewhat academic question of the effect of iteration of a smoothing 
formula on a function of infinite range for which the moments do not exist, is a 


difficult one, to which I cannot give a general answer. Schoenberg does not 
consider this problem, but merely gives the weight distribution to be applied 
to the original data in order to obtain the limiting smoothed distribution Two 
trivial examples may, however, serve to illustrate the nature of the considerations 
involved. If the original data are values of a polynomial of a specified degree, 
and if a smoothing formula which reproduces that degree is successively applied, 
it will of course continue indefinitely to reproduce the original values On the 
other hand, if the smoothing formula reproduces only polynomials of lower 
degree, a bias is introduced As a simple example, we may consider the caSe of 
smoothing the function y = rr' by a formula consisting of three weights each 
equal to 1/3 to be applied to the given value and its two immediate neighbors. 
It is easily shown that the smoothed value is x*® ff- 1/3, and the effect of successive 
application of this formula is to add 1/3 each time. Thus each smoothed value 
would tend toward infinity as the number of smoothings increases, however, 
the entire distribution would always remain a parabola of the same form as 

°”:^ally, I should like to emphasize that, in common with Dr. Dantzig, I 
do not regard infinite repetition of the smoothing operation as a practical pro- 
cXe S? consider it preferable to select, in the first instance, a smoothmg 
f™l’a which is likely to have the desired effect and then to perform the smooth- 
En aile step. In this way, one is more likely to secure the result desired 
Without losffig sight of important characteristics of the onginal data. 



(U 1 


'IPklYOM KAW.^DV 


INDEPENDENCE OF QUADRATIC FORMS IN NORMALLY 
CORRELATED VARIABLES' 

Rv Yi'KnnHi Kawada 
Tokyo VnuTrftily of htU ralurr and Hcirnre 

Tlio prolilcm to gives a nt*f'(™iry and Hiffivicnt condifinn that two quadratic 
forms in nommlly corralatwl \'ariid>lcH arc* indc'pcndcsnt was trc'atcd ijy many 
authors (1), [2], [3], [4], Id]. We aimll give* iioro also a solution ctf this prolilcm, 
which may he a generalization of that given by R. MatOm (hj for nonnegative 
ciuadratie forms to the general ease. 

Theoiibm 1. If two qimlrn lie forms 

n nt 

(!) Qt = £ a.,.r,J-M Qa = S 

Cj-l U-l 

in normally corriialnl variahln .ri, • • ■ , ,rn with zero means and mill the mriance 
matrix I saiisfy the folloumg four mulUiom 

(2) I'\, = EiQiQi) ~ Em^m « 0 iij = 1, 2), 

then the relation 

(3) AN = 0 (A » (fli/), N ® Qhj)) 
holds. 

ConoLUAnY 1. If Qi,Qi in (1) satisfy the four roiulitions (2), Own Qi and Qj 
are irulependenl. 

CoROLiiARY 2, (Necessity portion of the theorem of Craig) A necessary 
condition for the independence of Qi and Q} is AB 0. (The sufficiency was 
proved by Craig.) 

PnooF OF Theorem 1. The proof is very simple. Using the; values E{xk) = 0, 
{i = 1, 3, 5.7), E{xl) - 1, E{xi) - 3, N(4) = 15, E{x\) = 105 (A: = 1, • • • , n), 
we have by a straightfonvard calculation^ the following relations 

( 4 ) Fn^m{AB), 

(5) Fn == BTr{AB^) + iTr(,AB)Tr{B), 

(6) Fn = 8Tr(A“N) + iTr{AB)Tv{A), 

(7) Fn = d2TriA^B^) + 10rr((AN)*) + im{AByTr(A) + 10rr(A*N)I'r(N) 

+ m(AB)Tr{A)Tr{B) + 8Tr(AN)*. 

' Presented at the Chapel Hill meeting of the Institute of Mathematical Statistics and 
Biometric Society March 18, 1960. 

® If we apply an orthogonal transformation on (si , • • • , r„) so that A becomes a diagonal 
form, the calculation becomes simpler than with the general form. We may note here also 
the fact that we need not assume that rcj , •■ , Xn are normally correlated, but we use 
only the values of E(xl) (i = 1, ... , 8) for our proof. 



errata 


615 


Put C = AB. Let C be the transposed matrix of C. We have from (2), (4)~(7) 

(8) 2Tr{A^B^) + TriiABf) = 2Tr(,CC') + Tr{C^) = 0. 

The left side of (8) is equal to {c], + c„c,. + d), which is positive un¬ 

less all Cl, = 0 (i, j = 1, • • • , n). Hence we have C = AB = 0, q.e.d 
Corollary 1 follows from Theorem 1 and the theorem of Craig. Corollary 2 
results from observing that independence of Qi and implies (2). 

B. Mat6in proved, that A, B are nonnegative, then AB = 0 follows from a 
unique condition Fn = 2Tr{AB) = 0. If only one of the matrices A, B is assumed 
to be nonnegative, we have 

1 HE OREM 2. Let A be nonnegahve Then from two conditions Fn = 0, Fn 
= 0 in (S) follows the relation AB = 0 
Proof. From (4), ( 5 ) follows Tr{AB^) = 0. Since A is nonnegative, we can 
choose a real symmetric matrix Aa such that A = Al Put Co = AaB. Then 
we have Tr{AB^) = Tr(CoCa) = 0 and from this follows Co = 0. Hence we have 
also AB = AoCo = 0, q.e.d 


REFERENCES 

[1] A. T. Craio, “Note on the independence of certain quadratic forms’’, Annals of Math. 

Slat , Vol. 14 (1943), pp 195-197. 

[2] H. Hotellinq, “Note on a matne theoiem of A T. Craig’’, Annals of Math Slat , 

Vol 15 (1944), pp. 427-429 

[3] 11. Sakamoto, “On the independence of two statistics’’, Research Memoirs of Inst of 

Slat. Math , Tokyo, Vol 1 (1944), pp 1-26 (in Japanese). 

[4] K. Matusita, “Note on the independence of certain statistics’’. Annals of Inst of 

Slat. Maih., Tokyo, Vol 1 (1949), pp 79-82 

[5] J. OoAWA, “On the independence of bilineai and quadratic forms of a landom sample 

from a normal population”, Annals of Inst of Siat. Math., Tokyo, Vol 1 (1949), 
pp. 83-108. 

[6] B. MATiiRN, "Independence of non-negative quadratic forms in normally correlated 

variables”, Annals of Math. Stat, Vol. 20 (1949), pp 119-120 


ERRATA TO “CONTROL CHART FOR LARGEST 
AND SMALLEST VALUES” 

By John M. Howell 

Los Angeles City College 

In the paper cited in the title {Annals of Math. Stat., Vol. 20 (1949), p 306), 
there are some numerical errors m Table I Values of di!/2 and di are given by 
H. J. Godwin in “Some Low Moments of Order Statistics” in the same issue 



(ill) 


AHt'TU\(TS 


nf the Annnb,. Thcw valurw art* mojT arruratt* llian those heretofore available. 
A rorrecied I'able I biiml on theae vahiea Lh as ff)llf)\vs: 


» 

* : 

A 

At 

*^1 , 


n 

2 

' 1 mi 

.K250 

1.8800 

2 8061 

3.0111 

2 

3 

, 1.8920 

.7480 

1.0233 

1 8258 

3.0!K)2 

3 

4 

' 2.()Sf« 

“ 1)12 

.7286 

1..621K 

3.1330 

4 

5 

*2.3269 

.0090 

..6708 

1 3029 ' 

3 1099 

5 

6 

i 2.5344 

.0449 

.4832 

1 2834 ' 

3.2020 

0 

7 

; 2 7043 , 

.0280 

.4193 

1.1015 

3 2,303 

7 

8 

i 2.8472 

.GI07 

.3725 

1 1431 1 

3.2,556 

8 

9 

j 2.0700 - 

.5978 

.3307 

1.1038 

3.278-1 

9 

10 

’ 3.0775 

1 

.5868 

.3083 

! 1.0720 1 

,3.2992 

10 


ABSTRACTS OF PAPERS 

(A6s(roc(« of papm presented al Ihe Bcrkeky meeting of (he Inxlilute, 

August 5, 1960) 

1, Sampling from Populations with Overlapping Clusters, Z. W. BihnbjVum, 
University of Washington, Seattle, 

In cluster sampling it is usually nssumod that tlic clustont are disjoint. In this paper 
situations are considered in which this assumption is not fulfilled. Lot the population v 
consist of N individuals "j”, having the variates Fbh J “ 1 > 2, • • • , N, and lot K clusters 
C(i], t — 1, 2, • ■ • , K, be such that each "j” belongs to at least one cluster. Let «[j’l > 1 
be the number of different clusters to which ■“/' belongs (Ihe niulliplicity of "j"). The 
cluster C(i] contains Ni individuals with the variates Fli, (], i *» 1, 2, 
i “ 1, 2, •" , AT. In a sampling procedure, let sub-sample sizes rt[t) bo given for each Clf], 
and weights X[?, (] for each V[i, (]; a random sample of k clusters C[iul, u “ 1, 2, • • . , fc 
is obtained, then n(iu) individuals are sampled from Cli»], and for each of them its vari¬ 
ate and its multiplicity are recorded. Necessary and sufficient conditions are derived for 

^ F(fu , (,] X[iu, 1,1 being an unbiased estimate of P « Sf.iF, . The 

N 

variance of S is found, the weights are studied which minimize this variance, and some 
practically important special oases are derived. 

2. A Simple Nonparametric Test of Independence. Nite BLOijiQVieT, University 
of Stockholm. 

Consider a sample of size n from a two-dimensional distribution F(x, y). Let x and p 
denote the two sample medians and compute the number of individuals, say k, 8 ati 8 f 3 ring 
the inequality x < z, y < p (the trivial difficulty arising when n is an odd number can 
easily be overcome). A test of independence based on k is nonparametric. As a matter of 
fact one has under the null hypothesis that 






ABSTB.4.CTS 


617 


wliero m = ln/2|. In the ease of normal F with correlation coefficient p it is possible to 
show, by stuclyinp; the asymptotic behavior of the power function of the test in the neigh¬ 
borhood of p = 0, that the asymptotic eflicienty of the test is (2/vy, or about 41% This 
result IB liased on the fact that k has an asymptotically normal distribution if some regu¬ 
larity conditions arc fnlffiled In spite of its low efficiency it is suggested that the test be 
used in oases whore some information can be negiected in favor of the simplicity of the 
method. 


3. On Minimax Statistical Decision Procedures and Tbeir Admissibility. Colin 
R. Blyth, University of California, Berkeley, 

The problem consideied is that of using a sequence of observations on a random variable 
X to make a decision. Two loss functions Wi and TIT , each depending on the distribution 
F of X, the number n of observations taken, and the decision 6 made, are assumed given 
Minimax problems can be stated for weighted sums of TT'i and Tka , or for either one subject 
to an upper bound on the expectation of the other Under suitable conditions it is shown 
that solutions of the first type of problem provide solutions for all problems of the latter 
types, and that admissibility for a problem of the first type implies admissibility for prob¬ 
lems of the latter types. Two examples are given' estimation of B X when X is (1) normal 
with known variance, (2) rectangular with known range The two loss functions are in 
each case W\ •» n and an arbitrary nondecreasing function W^i 1 i — 8] ) Admissible 
minimax estimates are obtained. Extensions to any function WM) are indicated, two 
examples are given for the normal case where the sample size must be randomised among 
more than a oonaeoutivc pair of integers. 


4. SuffleJent Statistics and Unbiased Estimates for "Selected” Distributions. 

Douglas G. Chapman, University of Washington, Seattle. 

A family of distributions obtained from any given family by fixed selection may be 
called a "solooted” family. Tukey's theorem that such selected families admit the same 
set of suffioient statistics as the parent fainily is proved for an extended class of distribu¬ 
tions Further if the selection does not involve truncation the existence of mimmum vari¬ 
ance unbiased estimates of parameters of the parent family ensures the existence of similar 
estimates for the selected family. Some results are derived for minimum variance unbiased 
estimates for truncated distributions. 

6. The Unattainability of Certain Lower Bounds by ftoduct Densities. R. C. 
Davis, U. S. Naval Ordnance Testing Station, China Lake. 

Under weak regularity conditions it is shown that for the case in which the sample size 
is a nonr^dom variabll certain lower bounds are unattainable Consider a ™na e 
chance variable X, possessing an absolutely continuous distribution function 6), m 
wh^/is the uto parameter Under quite general regularity conditions Batankin 

ssssB-Hlssas 

an nddilional weak aasumption cracer - '' ' . , , (obtained by Barankin) 



(U8 


AMSTUACTS 


v,'innl>lc;^andfor«liif!i^p,r-ri , Xi , , x,) attains for fftcli 7i iht'Hjicnial lower hound given 

hy Harankin. OhviouBly in the ease a s* 2, the lower hound is arlueved Ity an effirient sta¬ 
tistic it one exists. 

0. A Note on the Power of the Sign Test. T. A. Jrkvks and Robert RicirARDs, 
timversity of California, Rttrkoley. 

Values ohlainetl hy using the normal ap|>rnxiinatiun to the nonrentral /-dmtrilmlum 
given hy Joluwmi and Welch were nmnpared with exact values given hy Neynian and 
Tokareka. The compariaon iudieated that officieneies of the sign teat eoinputed from the 
approximation would he eonaistently higher than the true, efficioneies. To avoid this bias 
the sign teat was randomieed ao that levels of Bigntfioanco of i* “ 05 and a « .01 were 
obtained and the exact values of the noncentral t used. Kiricicnrics were computed using 
various measurea of equivaleueu of the, power functions- (1) halancitig the are.a (Walsh), 
(2) minimizing the maxinuim differenee, (3) ccpializing the power at certain fixed points 
Tim various measures of etiuivalcnce yielded no marked di(Terenrca in eificieneics. Taldea 
wore given of the efficiencies for small n. The efficiency for a ■« .05 was about .7 for n be¬ 
tween 6 and 20 and Bomcwlint liiglier for a «> .01. Tlie. cffie-iency slowly approaches the 
asymptotic value of 2/t =» .0300 as n increasem. 

7. About Some Classes of Sequential Procedures for Obtaining Confidence 
Intervals of Given Length. (Preliminary Report). Wernkr R. Lkimbacher, 
University of California, Berkeley. 

The special class Ci of aiteh procedures indicated hy A. Wald {iScqnnUial Analysis, John 
Wiley & Sons, 19-17, pp. 1‘16-16G) can ho exlcndcd hy gonoralizing and improving Uio in¬ 
equality on which the procodurcs are based. It is shown tliat even in this larger class Ct, 
a procedure could possibly be optimum only under very special circumstances. The well 
known optimum procedure for a normal distribution W(e, 1) can he olitained as tlm limit 
of a Bcquonco of procedures from Ct , For the suggested sequence, however, the limit no 
longer belongs to Cj . In, order to oliminato various deficienres of Cj , a modified class Ct 
is proposed which contains the well known optimum procedures for tlie normal and rec¬ 
tangular distributions. Tho method indicated seems suggestive for the general case of 
estimating location parameters by confidence intervals. 

8. On the Stochastic Independence of Symmetric and Homogeneous Linear 
and Quadratic Statistics. Eugene Lukacs, U. S. Naval Ordnance Testing 
Station, China Lake. 

It is known that the sampling distributions of tho mean and of the variance nic stoclias- 
tically independent if and only it tho parent distribution is normal. Tliis was proven hy 
R. C. Geary (.Jour, Roy. Slal. Soc., iSuppL, Vol 3 (1930)).and using a difioront method by 
E. Liikncs [Annals of Math. Slat,, Vol. 13 (1942)). Tho question arises wliothor there are 
any distributions having the property that tho samiiling distrilmtions of tho mean and of a 
symmetric and homogeneous quadratic statistic are independent. It can bo sliown that 
there are only tho following possibilities: (1) the parent distribution is normal, (2) tho 
parent distribution la degenerate with a single saltus of one, (3) tlm parent distribution is 
a step function with two stops, located symmetrically wlLli respect to zero, (4) tho parent 
distribution is a gamma distribution 

9. The Distribution of the Maximum Deviation between Two Sample Cumula¬ 
tive Step Functions. Frank J Massey, Jh., University of Oregon. 

Let xi < xt < ■ ■ < i„ and yi < yt < ■ • • < y„he the ordered results of two random 
samples from populations having continuous cumulative distribution functions F[x) and 



ABSTRACTS 


019 


G{x) ipspeotively. Let iS„(s) = k/n when k is the nuinbei of observed values of X which 
aio less than or equal to x, andsimilarly let jSiI,(i/) = j/m where j is thenumbei of observed 
values of Y which are leas than or equal to y. The statistic d = max [ /S„(x) — iS^(x) | can 

be used to test the hypothesis Y'(x) = GCx), where the hypothesis would be icjected if the 
observed d is significantly large. In this paper a method of obtaining the exact distribution 
of ri for small samples is described, and a short table for equal size samples is included 
The general technique is that used by the author for the single sample case There is a 
lowei bound to the power of the teat against any specified alternative. This lower bound 
approaches one as n and m approach infimty proving that the test is consistent 


10. An Iterative Construction of the Optimum Sequential Decision Procedure 
with Linear Cost Function. Lincoln E. Moses, Stanford University. 

Where the cost of taking n observations is proportional to n, define a sequential decision 
procedure Bt by means of its associated "stopping region" T, T is the set of a posteriori 
probability distributions t(9) for which Bt instructs the statistician to take no observa¬ 
tion and to make the decision which minimizes the Bayes risk. Now let Bt be any sequen¬ 
tial decision procedure which has uniformly bounded average risk for every a priori dis¬ 
tribution, t(fl). Define T as the derived region of T T' is the set of f (9) such that the Bayes 
risk of stopping at {(fl) is not greater than the risk of taking one observation and then 
using Dt . Define Then it is shown that the sequence of regions n = 

1,2, ■ " is monotomcally decreasing to a limit region T", and that Dr* is the optimum se¬ 
quential decision procedure. Some numerical examples are given where the exact solution 
is obtained and the convergence of the iteration is examined. (This paper was prepared 
under the sponsorship of the Office of Naval Kescaroh.) 


11. On the Law of the Iterated Logarithm for Dependent Random Variables. 
Stanley W. Nash, University of California, Berkeley. 

The order of the remainder term is evaluated in the distribution function of the asymp¬ 
totically normal sum S,, of dependent random variables of a certain class considered by 
Loive. Bounds are found for the probability that max 1 | S B„z, where £. is the sum 

of the variances of components of S. . Given an infimte sequence of events , a nee- 
essary and sufficient condition is found for the probability that infinitely many 
occur to equal one. This criterion extends entena due to Borel. With these results estab¬ 
lished the law of the iterated logarithm is shown to hold for a wide subclass of Lo6ve s 
class of dependent random variables. Within this class the partial sum may ap- 

nr^ach normality with a speed which depends m a certain functional way on the previous 
Lm Si , and which may be arbitrarily slow for some values of S. . The conclusions gener¬ 
alize earlier results due to W. Doeblin and N. A. Sapogov. 

12. ConditiOHal Expectation and the Efficiency of Estimates. Paul G. Hoel. 
University of California, Los Angeles. 

;;ring "idtwL V/does not"y.eld an essentially better estimate than a well 
known estimate. 



620 


NEWS AND NOTICES 


13. Optimum Estimates for Location and Scale Parameters. Raymond P. 
Pbtebson, University of California and National Bureau of Standards, 
Los Angeles, 

Let /iidi'! P!, 9) “ ir(fl( , ' fl), where pfA | 9) is the. joint probability density 

function of the n (not neeessarily independent) sample values Xi , • • • , which may be 
represented as a (mint A >» (x,, ■ ■ • , in the n-dimcnsional Knclidean sample space M, 
The unknown parameters, 0i , ■ ■ , 6,, may he represented aa a point iJ ■» (0i, • • • , 9.) 
in the a-dimensional Euclidean parameter apace 11. VEffi, , fl) is a real-valued, nonnega- 
live, measurable weight function, defined for all ]•! in M and 0 in II, whieli represents the rela¬ 
tive seriousnem of taking the estimate fl*(E) as the, value of 9, for any particular sample 
point E, Let (7(9) be the unknown eumulalive distribution function of 9. Then d^iE) is 
delined to he a best estimate of ff. , provided that, if 9,(A*) is any other estimate of fl, m 
the elaau under consideration, 7 ~ /* > 0, whore 

/-/"/* /i,(IE| A, 9)d/!;(iGf(9). 

Jo Jjw 
Let 

r,(fl) « [ h,OV\M,0)(lE, ME) ^ [ h,(W\E,0)dB. 

Ju Ja 

A general llieorem is proved to tlie. efifeet tliat if Ai(IE 1 Ji, 9) is measurable over the product 
apace X fl and if r,(9) andv>,(/i.') are uniformly convergent integrals, tlien a !)est estimate 
9,*(A) of 9, oxisls jirovidod that r,(9) is constant and that o’(E) minimises ME) for all 
points A' in M. General methods are obtained for constructing best estimates for location 
and scalo parameters, soparatoly or jointly, and for funotious of location and scale param¬ 
eters from several populations, As special cases, results are derived whioli are analogous 
to convoracs of Theorems 1 and 3 in Kallianpur’s, "Minimax I'lstimates of Location and 
Scale Parameters", Abstract, (.dnnofs oj Math. Slat., Vol. 21 (1960), pp. 310-311), 


NEWS AND NOTICES 

Readerg are invited to mhmil to the Secretary of the ImMute news items of interest. 

Personal Items 

Professor William Feller of Cornell University has been appointed Eugene 
Higgins Professor of Mathematics at Princeton University. 

Dr. Leonard Kent, formerly on the staff at the University of Chicago in the 
School of Business, is now with the firm of Alderson and Sessions, 1906 Walnut 
Street, Philadelphia 3, Pennsylvania. 

Dr. G. B. Oakland has resigned an associate professorship of statistics at the 
University of Manitoba to accept the position as Plead of Biometrics Unit, 
Division of Administration, Department of Agriculture, Ottawa. 

Dr. Norman Rudy has accepted an appointment as Assistant Professor at 
Sacramento State College, Sacramento, California. 

Professor G. R. Seth has returned to India to accept the position of Professor 
of Statistics and Deputy Statistical Advisor to the Indian Council of Agricultural 
Research, New Delhi. 



NEWS AND NOTICES 


621 


Mr. Eric Weyl, textile engineering consultant, formerly of Manchester, New 
Hampshire, has moved his office to 2509 Vail Avenue, Charlotte, North Carolina. 
Mr. Weyl, a specialist in cotton spinning, serves as regular consultant to many 
leading textile mills. 


The completion and successful operation of SEAC—^the National Bureau of 
Standards Eastern Automatic Computer—^has been achieved by electronic scien¬ 
tists of the National Bureau of Standards. SEAC is a high-speed, general-purpose, 
automatically-sequenced electronic computer. It was developed and constructed, 
in a period of 20 months, by the staff of the National Bureau of Standards under 
the sponsorship of the Department of the Air Force to provide a high-speed 
computing service for Air Force Project SCOOP (Scientific Computation of 
Optimum Programs), a pioneering effort in the application of scientific principles 
to the large-scale problems of military management and administration. SEAC 
will also be available for solving important NBS problems of general scientific 
and engineering interest. 


New Members 

The following persons have been elected to membership in the Institute 
(June 1, 1960 to August 31, 1960) 

Aven, Russell E., M A. (Univ of Mias ), Graduate student, University of MiasiBaippi, 
1611 North Main St., Water Valley, Mississippi. 

Bamberger, Gunter, Dip -Math (Univ. Gottingen), Division head in the Statistical Office 
of the City of Cologne, Mandcrschcider Plaiz IS, Cologne-Sulz, Germany. 

Bangdlwala. Ishver S., M S (Univ N. C ), Graduate student, University of North Caro¬ 
lina, SIO A Phillips Hall, University of North Carolina, Chapel Hill. 

Borch, Karl Henrik. M Sc (Oslo Univ ), Field Science Officer for Middle East, UNESCO, 
IB Avenue Kieber, Pans 16e, France 

Buch, Kal R., M.So., Assistant Professor, Technical University of Denmark, Eigaardsveg 

14 . A’, CharloUenlund, Denmark. t j * ■ 1 

Carranza, Roque G., Ingcnicro Industrial (Univ Buenos Aires), Consultant Industrial 

Engineer, Parana 66, Bvcnos Aires, Argentina. 

Dominguez, Alberto G., Ph D. (Univ Buenos Aires), Professor of Mathematics, Pacultad 
de Ciencias Exactas, Fisicas y Naturales, University of Buenos Aires. Paraguay 1SS7, 

Buenos Aires, Argentina. ^ i iv j i 

Dunaway, William L.. B S. (Univ. of Calif.), Graduate student, Dept, of Math^atical 
Statistics, University of California, 4SS0 Cahuenga Boulevard, North Hollywood, Cali- 

FernandM, Jose J., Professor, University of Costa Rica, Ap. ISIS, San Jose, (^sia Rica. 
Fortet, Robert, Ph.D (Paris), Professor, Department of Science de Caen, 168 Rue Capo- 

D ■jr'of OL^n), «1 F,„kr„,., H„d 

Mathematics and Dean of the Paculty of Natural Scicncee nndMathemaUce, Univereity 



022 


ttKPUH'r nf* HKHKCl.BY MKETING 


of iTPilirag i. Hr., MuiiiiKPr of "(irwIlKCliiift fur Matliptuafik uiul .Mo- 

clianik". .'<lmll!itnmr S?, Frahurg i Hr , (hrmnny. 

Guilbaud, George T.. Agrege rfe 1 I’riiv n’lirief, Clliief, Bertiim a I’liwlifule of Sripnoe 
Ecottoiiii<iup Appliiiuce, Paritt, and ProfoBSor, Iimtitiile of StatiatiPH, UuiverHiiy nf 
Parie, .W limihmrd den f'upunnr^, Farit S, Francr 

Holloway, Clark, Jr.,M,S, (Uiiiv. of Ill ), ProruM llpacarrli Kiigitipcr, t!ulf KpHciircli niul 
DevPloj'iupnt Co , /’ H Pitlthnrgh W), /'rnnKi/lmiiia. 

tleberman, Gilbert,M.A. (CUduinlnii I’tiiv.l, Malbeinalipian, U. S. Naval Hpuparph f^alinni- 
lory, dSO Xnt'cmnh HI., H.E , [t’mhington SO, I). C. 

Lomax, K. S., .M A. (Manclipslpr I'nivJ, laipturer iti Eeonoinip KliiliatiPH, Eeonomips Pn- 
imrUnoiil, The Uniwraify, Maiirlipaler, England. 

Lorenz, Paul, Pli.D., Professor, University of Hcrlin, KaianrslulilslrimBe 21, Berlin-RclilueU- 
lonflnc, Oerjuniiy 

Lunger, George F,, AI.M.A, tUniv. of Mich.), Ktalifllician, (treat Lakes Invtistigations, Fish 
and D'itdlife I'lcrvice, Department (jfllie Inlerior, Sill) Arhur Vim BM,, Ann Arhur, 
Michigan. 

Maggy, Robert K., M A. (Univ. of Calif.), (Iradu.ale sttnlent, University nf Culifornia, 
IGHB FjitcUd Aurnar, Hcrkch'i/ I), (hdifnrnta. 

McElrath, Gayle W., M.K, (ITniv. of Mieli.), .AssiRlanl Professor, Dpiiarlinunt of Ungiiircr- 
ing, 2flS .Main Kiigiiiaeriiig Building, University of Minnesota, Minneapolis, Minnesota 

Nelslus, W. Vincent, M S. (Emory Univ.), Matheniatirs Iimtnirtor, (li'orgia Institute of 
Tedmology, S07 iS’/. dharlrs Aivnue, N.E., Atlanta />, (Iriirgia. 

Perloff, Robert, M,.\, (Ohio Stale Univ.), Oradunle sludenl and Heseareh Assislant, He- 
seareli Fouiulation, Ohio Hlate University, I3S1 Jin/ilcn Rmd, Calumhut 5, Ohio. 

Peter, Hans, Di. ler, pol., Professor of Keonomics, University of Ttlhingen, 'hdnngen- 
Wahlhnami 29, Germany. 

Putter, Joseph, M.Ho. (flohrcvv Univ,, Jerusalem), international J/oiise, Hcrkcky 4, Uali- 
fornia. 

Rankin, Bayard, A.I). (Univ. of Calif.), Graduate student, Univeraity of California, Inter- 
nalioml House, Berkeley 4, California. 

Reid, Albert T., B,S. (Iowa State Collogo), Rosoarcli Assistant in Afalliematical Biology, 
Committee on Mathomatie.al Biology, University of Chicago, .5741 Drcxcl Avenue, 
Chicago 37, Illinois. 

Shaw, Albert, B.S. (Univ. of Alberta), I, 0 oluror, University of Alberta, Department of 
Mathematics, University of Alberta, Edmonton, Alberta, Canada. 

Shuhany, Elizabeth, A.M. (Boston Umv.), Assistant Instructor in Statistics and Assistant 
in .Statistical Laboratory of Mathematics, Boston University, 725 Comnionwoalth 
Avenue, Boston 16, Mossacliusotta. 

Stewart, John N., B.A. (Univ of Michigan), Graduate student, Univorsity of Michigan, 
4894 Chalsworth, DelroU 24, Michigan. 

Strecker, Heinrich, Doctor der Naturwissonschaflen (Univ. Munchon), Mathematical 
Statistician in the Bavarian Statistical OfTice, IloscntieimcrBtrasae 130, Munich 8, 
Germany, 

Vaswanl, Sundrl (Miss) Ph.D. (Univ. of London), Itesoarch Associate in iStatistios, c/o 
Ahmcdabad Textile Industry’s Ilcaearoh Association, P.O. Box 170, Ahmedabad, India. 


REPORT OF THE BERKELEY MEETING OF THE INSTITUTE 

The forty-fourth meeting of the Institute of Mathematical Statistics was 
held on August 5,1950, on the Berkeley campus of the University of California, 
in conjunction with the Second Berkeley Symposium on Mathematical Statistics 



HEPORT OF BERKELEY MEETING 


(323 


and Probability which met fiom July 31 through August 12. Other organizations 
cooperating" with the Symposium were the Biopietrics Section of the American 
Statistical Association, The Western North American Region of the Biometric 
Society, the Econometric Society, the Institute of Transportation and Traffic 
Engineering of the University of California, and the Office of Naval Research. 
Some 218 persons registered for the Symposium, including the following 106 
members of the Institute: 


T. W. Anderson, Fied C. Andrews, Jane F. Andrian, Kenneth J Auow, Edward W. 
Barankin, Helen P. Beard, Eobert D Bedwell, Blair M Bennett, Joseph Berkson, Z, W 
Birnbauni, David Blackwell, E Blanco, Nils Blomqviat, Julius R Blum, Cohn R. Blyth, 
A, 11. Bowker, George W. Brown, Douglas G Chapman, C L Chiang,K L Chung, William 
G Cochran, Harald Cramdr, Edwin L Crow', J. H Curtiss, E. C Davis, W J, Dixon, J L 
Doob, A. Dvoretzky, Mary Elvebaok, Benjamin Epstein, Mark W. Eudey, Edward A. Fay, 
William Feller, Edgar H Fickensoher, E Fix, William R Gaffey, Eobert S Gardner, S. G 
Ghurye,M. A. Girsbick, Paul Gutt, Jack C Gysbers,T E Harris, J L Plodges, Jr , Wassily 
HoefTdmg, Paul G, Hoel, Harold Hotelling, John M Howell, Harry M. Hughes, R F. 
Jarrett, T. A. Jeeves, Mark Kao, Joseph Kampd de Fdnet, E. S Keeping, Ryoichi Kikuehi, 
Wilfred M. Kincaid, JI S Konijn, Charles H. Kraft, George M Kuznets, E, L Lehmann, 
Roy B Leipnik, Paul Levy, M. Lotve, Arvid T. Lonseth, Eugene Lukacs, C. A Magwire, 
Jacob Marsohak, Thomas Marschak, F. J Massey, Jr , A M Mood, Lincoln E Moses, 
James T, McWilliam, Stanley W Nash, J Neyman, Howard 0. Nielson, Gottfried E 
Noether, Stefan Peters, John C. Petersen, Raymond P. Peterson, Robert I. Piper, Joseph 
Putter, Robert R, Putz, Bayard Rankin, Fred D. Rigby, David Rubinstein, Elizabeth L. 
Scott, Esther Seiden, Aithur Shapiro, Richard H Shaw, Ronald W. Shephard, W B Simp¬ 
son, Monroe Sirken, M Sobel, Herbert Solomon, A. L Stewart, Donald E Stiling, G 
Szego, Robeit Tate, William F Taylor, Leo J Tick, A W Tucker, Elizabeth Vaughan, 
iShanti A. Vora, Abraham Wald, Allen Wallis, J Wolfowitz, Miriam L Yeviok. 

Because of the extensive program of more than fifty invited addresses at the 
Symposium, the Institute meeting was devoted only to contributed papers. 
Professor David Blackwell of Howard and Stanford Universities piesided at 
the Institute meeting, at which the following program was presented: 


1. Sampling from Populations with Overlapping Clusters Z W Biinbaum, University of 

Washington, Seattle , o, , 

2 A Simple Nonparametnc Test of Independence Nils Blomqvist, University of Stock- 

^3. On Mtnimax Statistical Decision Procedures and their Admissibility Colin R Blyth, 

University of California, Berkeley T^ , n 

4 Sufficient Statistics and Unbiased Estimates for “Selected” Distributions Douglas G 

Chapman, University of Washington, Seattle t. rt n ■ tt o 

6 The Unattainability of Certain Lower Bounds by Product Densities R, C Davis, U tj 

Naval Ordnance Testing Station, China Lake t. ^ j tt • t 

6. A Note on the Power of the Sign Test T. A. Jeeves and Robert Richards, University 

cLsIes of Segueniml Procedures for Obtaining Confidence Intervals of Given 
Length (Preliminary report) Werner R Lemibachor, University of California, Berke ey 
8 On the Stochastic Independence of Symmetric and Homogeneous Linear and Quadratic 
Statistics Eugene Lukacs, U S Naval Ordnance Testing Station, China Lake 



624 


RBPOKT OF BKIIKELKY MECTIXC 


0 . T*A« Dulribuliiin af Ihtt Maxirnniii nmatimi iHhi'iin TunStimiilf ('iimiiUilire Sop I'nnc- 
liom. Frank J Mnasey, Jr., Fnivpruily »if ' ,>Knn. 

10. An flrrulitif CoiiMrimlion iij fl., Oplimmn Sfifur nlmi IhnHina Priirrdim u'ilh Lininr 
C»tl Fimeluin latiroln K, Mown, Flnnfortl rnivrrxii.v, 

11. On the tjate nf the IlrrahNl fetrptnthm for Df/H tulrnt Httnihoii VnrmhlM. Stiitilf'v \V. 
Nash, Fnivorflity of (Jnlifornia, llprknli'y. 

12. Cmvlilinml Exprcialton atid the Effinnwit nf lidimnltfi. (Hy lilln). Paul fJ. Hofl, 
Universiiy ot (laliforiun, Lua 

13 nplimunt I'Miiiutlm fur fdicaliim unitSral- Paramrlir.i. (Hy liilo). Ilayinotnl I’ Ppirn - 
8on, Fniversity of California and Xalionnl Hurnuii of Stii»iLitd<r, Loa Vngfdp.s. 

The social activities at the SympoHiura included a tea on August 1, an ('.\nir- 
sion on August 3, a dinner on, August 7, a picnic on August 9, and colfee on 
July 31 and August 2, 4, 7, 8, 10, and 11. 

J. L. Hodges, Jit, 
Associate Secrelary 




